develooper Front page | perl.perl5.porters | Postings from December 2015

Re: Obsolete text in utf8.pm

Thread Previous | Thread Next
From:
Zefram
Date:
December 6, 2015 00:45
Subject:
Re: Obsolete text in utf8.pm
Message ID:
20151206004537.GU13455@fysh.org
Karl Williamson quoted:
>One can have UTF-8 in identifier names,

Taking "UTF-8" to mean "non-ASCII characters", this is not (and never
has been) consistently true.  The de facto identifier syntax varies
according to source encoding.  This is one of the remaining instances
of the Unicode bug.

$ perl -e '$a="\$a\xc0b; 1"; utf8::downgrade($a); eval $a; print $@ || "OK\n"'
Unrecognized character \xC0; marked by <-- HERE after $a<-- HERE near column 3 at (eval 1) line 1.
$ perl -e '$a="\$a\xc0b; 1"; utf8::upgrade($a); eval $a; print $@ || "OK\n"'
OK

It is possible to have those characters *if represented in UTF-8 in the
source*, but it's not a consistent feature of the language grammar.

>                                        but not in package/class or
>subroutine names.

It was never true that these contexts differed from other kinds of
identifier.  From 5.8.0 onwards, up to 5.22, all of these kinds of
identifier can have non-ASCII characters if the source is in upgraded
form, and cannot if the source is downgraded.

>                   While some limited functionality towards this does
>exist as of Perl 5.8.0, that is more accidental than designed; use of
>UTF-8 for the said purposes is unsupported.

Yes, it does look accidental, and we do not in fact support it.

>One reason of this unfinishedness is its (currently) inherent
>unportability: since both package names and subroutine names may need
>to be mapped to file and directory names, the Unicode capability of
>the filesystem becomes important-- and there unfortunately aren't
>portable answers.

This is indeed another issue that has never been addressed and would
need to be resolved in order to fully support non-ASCII identifiers.
Package names are mapped to pathnames for module loading, and I believe
subroutine names for the old system of module splitting.  We essentially
need a name mangling layer, to map language-supported identifiers to
inoffensive filenames.  (Really, we already need a name mangling layer
to cope with case-insensitive filesystems, but in practice we just put
up with the resulting misbehaviour.)

-zefram

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About