develooper Front page | perl.perl5.porters | Postings from August 2021

Re: "use v5.36.0" should imply UTF-8 encoded source

Thread Previous | Thread Next
From:
Felipe Gasper
Date:
August 2, 2021 00:35
Subject:
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:
BF204F33-F1DC-449E-972C-6C27C1DBAB25@felipegasper.com


> On Aug 1, 2021, at 10:23 AM, Leon Timmermans <fawaka@gmail.com> wrote:
> 
> Code is not binary, it is text. E.g.:
> 
> use 5.010;
> { no utf8;  say "éé" =~ /\N{LATIN SMALL LETTER E WITH ACUTE}/ ? "yes" : "no" };
> { use utf8; say "éé" =~ /\N{LATIN SMALL LETTER E WITH ACUTE}/ ? "yes" : "no" };
> 
> The status quo is only reasonable in that 95% of all code is actually ASCII, so it usually doesn't matter.

Code is indeed text, but this is not reasonable:

> perl -Mutf8 -e'print "é"'
�

… particularly in contrast to this:

> echo é | perl -Mutf8 -e 'print <>'
é

… and these:

> node -e 'console.log("é")'
é

> python -c 'print("é")'
é

> ruby -e 'puts "é"'
é

> echo '<?php print "é" ?>' | php
é

> echo | awk '{print "é"}'
é

> julia -e'print("é")'
é

> lua -e'print "é"'
é


For Unicode-aware applications it is indeed useful to auto-decode the strings, but is it really worth making Perl’s “modern default” the exceptionally weird behaviour of making:

perl -E'print "¡Hola, mundo!"'

… *not* print the given text correctly?

It just doesn’t seem a very workable “modern default”. How feasible, instead, would something like the following be:

------

1. Devote 2 bits of each SV to storing whether the PV is text or bytes:

    0 0 = unknown
    0 1 = text
    1 0 = bytes
    1 1 = reserved/unused

2. Create string::decode_utf8() and string::encode_utf8() built-ins that access those bits. (Or string::from('UTF-16LE', …) etc.)

3. Under `use experimental 'autoencode'` blocks, teach Perl to auto-encode text strings when printing them (or otherwise sending them to the OS). Such blocks would also imply `use utf8`.

4. Outside such blocks, any operations on the strings reset the bytes/text bits.


Then, once/if that feature works, Perl can *really* up its game: better Windows support, JSON could fail if asked to encode binary or decode text, etc.


-F


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About