On 2/8/21 2:34, Felipe Gasper wrote: > > >> On Aug 1, 2021, at 10:23 AM, Leon Timmermans <fawaka@gmail.com> wrote: >> >> Code is not binary, it is text. E.g.: >> >> use 5.010; >> { no utf8; say "éé" =~ /\N{LATIN SMALL LETTER E WITH ACUTE}/ ? "yes" : "no" }; >> { use utf8; say "éé" =~ /\N{LATIN SMALL LETTER E WITH ACUTE}/ ? "yes" : "no" }; >> >> The status quo is only reasonable in that 95% of all code is actually ASCII, so it usually doesn't matter. > > Code is indeed text, but this is not reasonable: > >> perl -Mutf8 -e'print "é"' > � > > … particularly in contrast to this: > >> echo é | perl -Mutf8 -e 'print <>' > é > > … and these: > >> node -e 'console.log("é")' > é > >> python -c 'print("é")' > é > >> ruby -e 'puts "é"' > é > >> echo '<?php print "é" ?>' | php > é > >> echo | awk '{print "é"}' > é > >> julia -e'print("é")' > é > >> lua -e'print "é"' > é > > > For Unicode-aware applications it is indeed useful to auto-decode the strings, but is it really worth making Perl’s “modern default” the exceptionally weird behaviour of making: > > perl -E'print "¡Hola, mundo!"' > > … *not* print the given text correctly? > > It just doesn’t seem a very workable “modern default”. How feasible, instead, would something like the following be: > > ------ > > 1. Devote 2 bits of each SV to storing whether the PV is text or bytes: > > 0 0 = unknown > 0 1 = text > 1 0 = bytes > 1 1 = reserved/unused IMO that's not the correct way to approach the problem here. Perl already has PerlIO that allows transparent encoding/decoding of data on some IO interfaces, and that support should be expanded to support all of them. Otherwise you are asking the programmer to do that translation explicitly every time some data goes through any builtin doing IO, as in: mkdir do_encoding($dirname); That doesn't make sense at all. What we need is to add proper support for the transparent translation of data between the internal representation and the outside encoding everywhere. And that means: a) Adding this translation feature to all the builtins doing IO b) Adding a mechanism so that the developer can configure it (for instance, set filesystem encoding). c) Infer sane defaults from the environment (utf8 has been the default encoding in most Linux/Unix systems for the last two decades, but Perl still expects latin1 from STDIO!) And regarding (c), that's also why most of your examples above work. Your terminal sends and expects utf-8 encoded data from perl but perl expects/sends latin1. It just happens that in your examples it is consistently wrong, but there are myriads of other cases where it isn't.Thread Previous | Thread Next