Front page | perl.perl5.porters |
Postings from June 2020
Re: Announcing Perl 7
Thread Previous
|
Thread Next
From:
Salvador Fandiño
Date:
June 30, 2020 09:48
Subject:
Re: Announcing Perl 7
Message ID:
cfa83790-f6a3-a17d-1391-0d790cff81d0@gmail.com
On 30/6/20 1:09, Eric Brine wrote:
> On Sat, Jun 27, 2020 at 12:46 PM Dan Book <grinnz@gmail.com
> <mailto:grinnz@gmail.com>> wrote:
>
> unicode_strings causes a specific set of functions in that lexical
> scope to use Unicode rules when determining how they interact with a
> string, instead of possibly using ASCII rules if the string is
> downgraded as the previous heuristic did.
>
>
> Or put otherwise, it simply fixes a handful of builtin functions that
> otherwise suffer from The Unicode Bug (behave differently depending on
> the internal storage format of the their input).
In the case of functions that interface with the external world it is
not as easy as saying they should expect the data to be in the native
Unicode encoding (i.e. UTF-8 or UTF-16), specially if that is going to
be the default behavior in p7.
Nowadays, on Windows, Linux and probably most UNIX variants UTF8 (or
UTF16) is usually the default encoding for the file system metadata, but
the OS does nothing to enforce that. Filenames can still contain byte
(or wchar_t) sequences that are not valid.
In my experience, those broken names are not so rare, due to buggy
software, old data from times when latin1 was still the norm, file
systems with a fixed encoding, etc.
IMO, it would be a mistake to have perl throw an error when encountering
any such data. On the contrary, it should be able to read, process and
write it back untouched, end to end.
For instance:
my $fn = readdir $dh;
open my $fh, ">/tmp/$fn"
Should be able to read a filename with a broken name and create a new
one with exactly the same broken name.
Doing otherwise would leave to the programmer the burden of explicitly
handling those cases.
And BTW, Raku already does that using the UTF8-C8 encoding:
https://docs.raku.org/language/unicode
Thread Previous
|
Thread Next