develooper Front page | perl.perl5.porters | Postings from June 2020

Re: Announcing Perl 7

Thread Previous | Thread Next
From:
Salvador Fandiño
Date:
June 30, 2020 09:48
Subject:
Re: Announcing Perl 7
Message ID:
cfa83790-f6a3-a17d-1391-0d790cff81d0@gmail.com
On 30/6/20 1:09, Eric Brine wrote:
 > On Sat, Jun 27, 2020 at 12:46 PM Dan Book <grinnz@gmail.com
 > <mailto:grinnz@gmail.com>> wrote:
 >
 >     unicode_strings causes a specific set of functions in that lexical
 >     scope to use Unicode rules when determining how they interact with a
 >     string, instead of possibly using ASCII rules if the string is
 >     downgraded as the previous heuristic did.
 >
 >
 > Or put otherwise, it simply fixes a handful of builtin functions that
 > otherwise suffer from The Unicode Bug (behave differently depending on
 > the internal storage format of the their input).

In the case of functions that interface with the external world it is 
not as easy as saying they should expect the data to be in the native 
Unicode encoding (i.e. UTF-8 or UTF-16), specially if that is going to 
be the default behavior in p7.

Nowadays, on Windows, Linux and probably most UNIX variants UTF8 (or 
UTF16) is usually the default encoding for the file system metadata, but 
the OS does nothing to enforce that. Filenames can still contain byte 
(or wchar_t) sequences that are not valid.

In my experience, those broken names are not so rare, due to buggy 
software, old data from times when latin1 was still the norm, file 
systems with a fixed encoding, etc.

IMO, it would be a mistake to have perl throw an error when encountering 
any such data. On the contrary, it should be able to read, process and 
write it back untouched, end to end.

For instance:

    my $fn = readdir $dh;
    open my $fh, ">/tmp/$fn"

Should be able to read a filename with a broken name and create a new 
one with exactly the same broken name.

Doing otherwise would leave to the programmer the burden of explicitly 
handling those cases.

And BTW, Raku already does that using the UTF8-C8 encoding: 
https://docs.raku.org/language/unicode

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About