On 23/12/2015 18:54, Karl Williamson wrote: > On 12/05/2015 05:45 PM, Zefram wrote: >> Karl Williamson quoted: >>> One can have UTF-8 in identifier names, >> >> Taking "UTF-8" to mean "non-ASCII characters", this is not (and never >> has been) consistently true. The de facto identifier syntax varies >> according to source encoding. This is one of the remaining instances >> of the Unicode bug. >> >> $ perl -e '$a="\$a\xc0b; 1"; utf8::downgrade($a); eval $a; print $@ >> || "OK\n"' >> Unrecognized character \xC0; marked by <-- HERE after $a<-- HERE near >> column 3 at (eval 1) line 1. >> $ perl -e '$a="\$a\xc0b; 1"; utf8::upgrade($a); eval $a; print $@ || >> "OK\n"' >> OK >> >> It is possible to have those characters *if represented in UTF-8 in the >> source*, but it's not a consistent feature of the language grammar. >> >>> but not in package/class or >>> subroutine names. >> >> It was never true that these contexts differed from other kinds of >> identifier. From 5.8.0 onwards, up to 5.22, all of these kinds of >> identifier can have non-ASCII characters if the source is in upgraded >> form, and cannot if the source is downgraded. >> >>> While some limited functionality towards this does >>> exist as of Perl 5.8.0, that is more accidental than designed; use of >>> UTF-8 for the said purposes is unsupported. >> >> Yes, it does look accidental, and we do not in fact support it. >> >>> One reason of this unfinishedness is its (currently) inherent >>> unportability: since both package names and subroutine names may need >>> to be mapped to file and directory names, the Unicode capability of >>> the filesystem becomes important-- and there unfortunately aren't >>> portable answers. >> >> This is indeed another issue that has never been addressed and would >> need to be resolved in order to fully support non-ASCII identifiers. >> Package names are mapped to pathnames for module loading, and I believe >> subroutine names for the old system of module splitting. We essentially >> need a name mangling layer, to map language-supported identifiers to >> inoffensive filenames. (Really, we already need a name mangling layer >> to cope with case-insensitive filesystems, but in practice we just put >> up with the resulting misbehaviour.) >> >> -zefram >> > > The text in question is referring to under the 'use utf8' pragma. So > it really does mean strict UTF-8, and not just non-ASCII. > > It appears that UTF-8 is supported in subroutine and package names. > How does the following text look as a substitute for what is there now? > > BUGS > > Some filesystems may not support UTF-8 file names, or they may be > supported incompatibly with Perl. Therefore UTF-8 names that are > visible to the filesystem, such as module names may not work. > You have another problem with utf8 in identifiers in that $é is not the same as $é. Unless something between my email client and your email client has normalised this message. This could make a difference if you try to import a sub from a package and the text in the package and importing script have different ideas of how the identifiers should be normalised. I would like to propose that identifiers added into the stash by Perl its self are done so in Normalization Form C this does not stop a programmer from directly manipulating the stash with such as ${"é::é"} but in this case they are supposed to know what they are doing. Thoughts please.Thread Previous | Thread Next