On 12/05/2015 05:45 PM, Zefram wrote: > Karl Williamson quoted: >> One can have UTF-8 in identifier names, > > Taking "UTF-8" to mean "non-ASCII characters", this is not (and never > has been) consistently true. The de facto identifier syntax varies > according to source encoding. This is one of the remaining instances > of the Unicode bug. > > $ perl -e '$a="\$a\xc0b; 1"; utf8::downgrade($a); eval $a; print $@ || "OK\n"' > Unrecognized character \xC0; marked by <-- HERE after $a<-- HERE near column 3 at (eval 1) line 1. > $ perl -e '$a="\$a\xc0b; 1"; utf8::upgrade($a); eval $a; print $@ || "OK\n"' > OK > > It is possible to have those characters *if represented in UTF-8 in the > source*, but it's not a consistent feature of the language grammar. > >> but not in package/class or >> subroutine names. > > It was never true that these contexts differed from other kinds of > identifier. From 5.8.0 onwards, up to 5.22, all of these kinds of > identifier can have non-ASCII characters if the source is in upgraded > form, and cannot if the source is downgraded. > >> While some limited functionality towards this does >> exist as of Perl 5.8.0, that is more accidental than designed; use of >> UTF-8 for the said purposes is unsupported. > > Yes, it does look accidental, and we do not in fact support it. > >> One reason of this unfinishedness is its (currently) inherent >> unportability: since both package names and subroutine names may need >> to be mapped to file and directory names, the Unicode capability of >> the filesystem becomes important-- and there unfortunately aren't >> portable answers. > > This is indeed another issue that has never been addressed and would > need to be resolved in order to fully support non-ASCII identifiers. > Package names are mapped to pathnames for module loading, and I believe > subroutine names for the old system of module splitting. We essentially > need a name mangling layer, to map language-supported identifiers to > inoffensive filenames. (Really, we already need a name mangling layer > to cope with case-insensitive filesystems, but in practice we just put > up with the resulting misbehaviour.) > > -zefram > The text in question is referring to under the 'use utf8' pragma. So it really does mean strict UTF-8, and not just non-ASCII. It appears that UTF-8 is supported in subroutine and package names. How does the following text look as a substitute for what is there now? BUGS Some filesystems may not support UTF-8 file names, or they may be supported incompatibly with Perl. Therefore UTF-8 names that are visible to the filesystem, such as module names may not work.Thread Previous | Thread Next