develooper Front page | perl.perl5.porters | Postings from July 2021

Re: "use v5.36.0" should imply UTF-8 encoded source

Thread Previous | Thread Next
Dan Book
July 30, 2021 17:55
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:
On Fri, Jul 30, 2021 at 1:48 PM Leon Timmermans <> wrote:

> On Fri, Jul 30, 2021 at 6:56 PM Felipe Gasper <>
> wrote:
>> FWIW, I think this will regress Perl’s usability.
>> Probably the worst part about character encoding in Perl is that nothing
>> indicates when you’ve over-encoded or under-encoded. But, at the very least
>> everything right now is consistent by default: source code is parsed as
>> bytes (“Latin-1”), and I/O happens as bytes. Thus, a “minimal-effort”
>> approach to writing Perl will at least minimize the odds of encoding
>> mismatches: you only run into trouble if you explicitly decode/encode.
>> If `use v5.36` is to disrupt that consistency by making source code
>> UTF-8-decoded but *leaving* I/O as bytes, this seems likely to add another
>> “shin-bumper” to use of Perl that doesn’t happen in languages that type
>> byte strings differently from text strings.
>> So quick-and-simple things like `print "é"` will now, in “modern” Perl,
>> break, with no indication of where/why until a human being comes along,
>> notices the problem, and puts in the time to debug it.
> It doesn't actually break. PerlIO will try to downgrade that for a
> non-:utf8 handle, or upgrade for a :utf8 handle.

Not that it will break in implementation, but in logic. It will print the
ISO-8859-1 bytes instead of how it currently would print the UTF-8 encoded
bytes, since it started as that. (But also string operations on that
UTF-8-encoded string within the code would be wrong, but that doesn't
always matter.)


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About