> On Aug 1, 2021, at 10:23 AM, Leon Timmermans <fawaka@gmail.com> wrote: > > Code is not binary, it is text. E.g.: > > use 5.010; > { no utf8; say "éé" =~ /\N{LATIN SMALL LETTER E WITH ACUTE}/ ? "yes" : "no" }; > { use utf8; say "éé" =~ /\N{LATIN SMALL LETTER E WITH ACUTE}/ ? "yes" : "no" }; > > The status quo is only reasonable in that 95% of all code is actually ASCII, so it usually doesn't matter. Code is indeed text, but this is not reasonable: > perl -Mutf8 -e'print "é"' � … particularly in contrast to this: > echo é | perl -Mutf8 -e 'print <>' é … and these: > node -e 'console.log("é")' é > python -c 'print("é")' é > ruby -e 'puts "é"' é > echo '<?php print "é" ?>' | php é > echo | awk '{print "é"}' é > julia -e'print("é")' é > lua -e'print "é"' é For Unicode-aware applications it is indeed useful to auto-decode the strings, but is it really worth making Perl’s “modern default” the exceptionally weird behaviour of making: perl -E'print "¡Hola, mundo!"' … *not* print the given text correctly? It just doesn’t seem a very workable “modern default”. How feasible, instead, would something like the following be: ------ 1. Devote 2 bits of each SV to storing whether the PV is text or bytes: 0 0 = unknown 0 1 = text 1 0 = bytes 1 1 = reserved/unused 2. Create string::decode_utf8() and string::encode_utf8() built-ins that access those bits. (Or string::from('UTF-16LE', …) etc.) 3. Under `use experimental 'autoencode'` blocks, teach Perl to auto-encode text strings when printing them (or otherwise sending them to the OS). Such blocks would also imply `use utf8`. 4. Outside such blocks, any operations on the strings reset the bytes/text bits. Then, once/if that feature works, Perl can *really* up its game: better Windows support, JSON could fail if asked to encode binary or decode text, etc. -FThread Previous | Thread Next