Front page | perl.perl5.porters |
Postings from January 2016
Re: Obsolete text in utf8.pm
From: Karl Williamson
January 7, 2016 16:38
Re: Obsolete text in utf8.pm
Message ID: 568E927B.email@example.com
In the absence of any feedback, I pushed the proposed text as
On 12/23/2015 11:54 AM, Karl Williamson wrote:
> On 12/05/2015 05:45 PM, Zefram wrote:
>> Karl Williamson quoted:
>>> One can have UTF-8 in identifier names,
>> Taking "UTF-8" to mean "non-ASCII characters", this is not (and never
>> has been) consistently true. The de facto identifier syntax varies
>> according to source encoding. This is one of the remaining instances
>> of the Unicode bug.
>> $ perl -e '$a="\$a\xc0b; 1"; utf8::downgrade($a); eval $a; print $@ ||
>> Unrecognized character \xC0; marked by <-- HERE after $a<-- HERE near
>> column 3 at (eval 1) line 1.
>> $ perl -e '$a="\$a\xc0b; 1"; utf8::upgrade($a); eval $a; print $@ ||
>> It is possible to have those characters *if represented in UTF-8 in the
>> source*, but it's not a consistent feature of the language grammar.
>>> but not in package/class or
>>> subroutine names.
>> It was never true that these contexts differed from other kinds of
>> identifier. From 5.8.0 onwards, up to 5.22, all of these kinds of
>> identifier can have non-ASCII characters if the source is in upgraded
>> form, and cannot if the source is downgraded.
>>> While some limited functionality towards this does
>>> exist as of Perl 5.8.0, that is more accidental than designed; use of
>>> UTF-8 for the said purposes is unsupported.
>> Yes, it does look accidental, and we do not in fact support it.
>>> One reason of this unfinishedness is its (currently) inherent
>>> unportability: since both package names and subroutine names may need
>>> to be mapped to file and directory names, the Unicode capability of
>>> the filesystem becomes important-- and there unfortunately aren't
>>> portable answers.
>> This is indeed another issue that has never been addressed and would
>> need to be resolved in order to fully support non-ASCII identifiers.
>> Package names are mapped to pathnames for module loading, and I believe
>> subroutine names for the old system of module splitting. We essentially
>> need a name mangling layer, to map language-supported identifiers to
>> inoffensive filenames. (Really, we already need a name mangling layer
>> to cope with case-insensitive filesystems, but in practice we just put
>> up with the resulting misbehaviour.)
> The text in question is referring to under the 'use utf8' pragma. So it
> really does mean strict UTF-8, and not just non-ASCII.
> It appears that UTF-8 is supported in subroutine and package names. How
> does the following text look as a substitute for what is there now?
> Some filesystems may not support UTF-8 file names, or they may be
> supported incompatibly with Perl. Therefore UTF-8 names that are
> visible to the filesystem, such as module names may not work.