develooper Front page | perl.perl5.porters | Postings from July 2021

Re: "use v5.36.0" should imply UTF-8 encoded source

Thread Previous | Thread Next
From:
Dan Book
Date:
July 30, 2021 18:43
Subject:
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:
CABMkAVXDZOwKMNc_Ns6Kvku92projrscTtqu0vx8fNCQibf4Ug@mail.gmail.com
On Fri, Jul 30, 2021 at 2:28 PM Leon Timmermans <fawaka@gmail.com> wrote:

> On Fri, Jul 30, 2021 at 7:56 PM Felipe Gasper <felipe@felipegasper.com>
> wrote:
>
>>
>>
>> > On Jul 30, 2021, at 1:48 PM, Leon Timmermans <fawaka@gmail.com> wrote:
>> >
>> > On Fri, Jul 30, 2021 at 6:56 PM Felipe Gasper <felipe@felipegasper.com>
>> wrote:
>> > FWIW, I think this will regress Perl’s usability.
>> >
>> > Probably the worst part about character encoding in Perl is that
>> nothing indicates when you’ve over-encoded or under-encoded. But, at the
>> very least everything right now is consistent by default: source code is
>> parsed as bytes (“Latin-1”), and I/O happens as bytes. Thus, a
>> “minimal-effort” approach to writing Perl will at least minimize the odds
>> of encoding mismatches: you only run into trouble if you explicitly
>> decode/encode.
>> >
>> > If `use v5.36` is to disrupt that consistency by making source code
>> UTF-8-decoded but *leaving* I/O as bytes, this seems likely to add another
>> “shin-bumper” to use of Perl that doesn’t happen in languages that type
>> byte strings differently from text strings.
>> >
>> > So quick-and-simple things like `print "é"` will now, in “modern” Perl,
>> break, with no indication of where/why until a human being comes along,
>> notices the problem, and puts in the time to debug it.
>> >
>> > It doesn't actually break. PerlIO will try to downgrade that for a
>> non-:utf8 handle, or upgrade for a :utf8 handle.
>>
>> It’ll downgrade it, but it won’t encode it, so you’ll get mojibake:
>>
>> > perl -Mutf8 -e'print "é"'
>> �
>>
>
> It will print mojibake as well if the script is latin-1 encoded. It's
> mojibake because the terminal is utf-8, but the IO handle is latin1.
>

The difference is the orders of magnitude of people that would accidentally
run a latin1 script on a utf8 terminal, vs that would run a utf8 script on
a utf8 terminal with "use utf8" and not understand that they have to encode
the output.

-Dan

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About