develooper Front page | perl.perl5.porters | Postings from August 2021

Re: "use v5.36.0" should imply UTF-8 encoded source

Thread Previous | Thread Next
Felipe Gasper
August 2, 2021 00:35
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:

> On Aug 1, 2021, at 10:23 AM, Leon Timmermans <> wrote:
> Code is not binary, it is text. E.g.:
> use 5.010;
> { no utf8;  say "éé" =~ /\N{LATIN SMALL LETTER E WITH ACUTE}/ ? "yes" : "no" };
> { use utf8; say "éé" =~ /\N{LATIN SMALL LETTER E WITH ACUTE}/ ? "yes" : "no" };
> The status quo is only reasonable in that 95% of all code is actually ASCII, so it usually doesn't matter.

Code is indeed text, but this is not reasonable:

> perl -Mutf8 -e'print "é"'

… particularly in contrast to this:

> echo é | perl -Mutf8 -e 'print <>'

… and these:

> node -e 'console.log("é")'

> python -c 'print("é")'

> ruby -e 'puts "é"'

> echo '<?php print "é" ?>' | php

> echo | awk '{print "é"}'

> julia -e'print("é")'

> lua -e'print "é"'

For Unicode-aware applications it is indeed useful to auto-decode the strings, but is it really worth making Perl’s “modern default” the exceptionally weird behaviour of making:

perl -E'print "¡Hola, mundo!"'

… *not* print the given text correctly?

It just doesn’t seem a very workable “modern default”. How feasible, instead, would something like the following be:


1. Devote 2 bits of each SV to storing whether the PV is text or bytes:

    0 0 = unknown
    0 1 = text
    1 0 = bytes
    1 1 = reserved/unused

2. Create string::decode_utf8() and string::encode_utf8() built-ins that access those bits. (Or string::from('UTF-16LE', …) etc.)

3. Under `use experimental 'autoencode'` blocks, teach Perl to auto-encode text strings when printing them (or otherwise sending them to the OS). Such blocks would also imply `use utf8`.

4. Outside such blocks, any operations on the strings reset the bytes/text bits.

Then, once/if that feature works, Perl can *really* up its game: better Windows support, JSON could fail if asked to encode binary or decode text, etc.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About