develooper Front page | perl.perl5.porters | Postings from December 2015

Re: Obsolete text in utf8.pm

Thread Previous | Thread Next
From:
Karl Williamson
Date:
December 23, 2015 18:55
Subject:
Re: Obsolete text in utf8.pm
Message ID:
567AEDEB.2070202@khwilliamson.com
On 12/05/2015 05:45 PM, Zefram wrote:
> Karl Williamson quoted:
>> One can have UTF-8 in identifier names,
>
> Taking "UTF-8" to mean "non-ASCII characters", this is not (and never
> has been) consistently true.  The de facto identifier syntax varies
> according to source encoding.  This is one of the remaining instances
> of the Unicode bug.
>
> $ perl -e '$a="\$a\xc0b; 1"; utf8::downgrade($a); eval $a; print $@ || "OK\n"'
> Unrecognized character \xC0; marked by <-- HERE after $a<-- HERE near column 3 at (eval 1) line 1.
> $ perl -e '$a="\$a\xc0b; 1"; utf8::upgrade($a); eval $a; print $@ || "OK\n"'
> OK
>
> It is possible to have those characters *if represented in UTF-8 in the
> source*, but it's not a consistent feature of the language grammar.
>
>>                                         but not in package/class or
>> subroutine names.
>
> It was never true that these contexts differed from other kinds of
> identifier.  From 5.8.0 onwards, up to 5.22, all of these kinds of
> identifier can have non-ASCII characters if the source is in upgraded
> form, and cannot if the source is downgraded.
>
>>                    While some limited functionality towards this does
>> exist as of Perl 5.8.0, that is more accidental than designed; use of
>> UTF-8 for the said purposes is unsupported.
>
> Yes, it does look accidental, and we do not in fact support it.
>
>> One reason of this unfinishedness is its (currently) inherent
>> unportability: since both package names and subroutine names may need
>> to be mapped to file and directory names, the Unicode capability of
>> the filesystem becomes important-- and there unfortunately aren't
>> portable answers.
>
> This is indeed another issue that has never been addressed and would
> need to be resolved in order to fully support non-ASCII identifiers.
> Package names are mapped to pathnames for module loading, and I believe
> subroutine names for the old system of module splitting.  We essentially
> need a name mangling layer, to map language-supported identifiers to
> inoffensive filenames.  (Really, we already need a name mangling layer
> to cope with case-insensitive filesystems, but in practice we just put
> up with the resulting misbehaviour.)
>
> -zefram
>

The text in question is referring to under the 'use utf8' pragma.  So it 
really does mean strict UTF-8, and not just non-ASCII.

It appears that UTF-8 is supported in subroutine and package names.  How 
does the following text look as a substitute for what is there now?

BUGS

Some filesystems may not support UTF-8 file names, or they may be 
supported incompatibly with Perl.  Therefore UTF-8 names that are 
visible to the filesystem, such as module names may not work.


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About