On 28/12/2015 17:22, demerphq wrote: > On 28 December 2015 at 17:47, Father Chrysostomos <sprout@cpan.org> wrote: >> John Imrie wrote: >>> On 28/12/2015 01:01, Father Chrysostomos wrote: >>>> But that is exactly how symbols are exported (assuming you mean * >>>> rather than *). Normalization for hash access (which is how we would >>>> have to do this), even if it is limited to stashes, would be an effic- >>>> iency nightmare. >>> I was >>> hoping that we could do the normalisation on insert into the stash. So >>> that the stash it's self was normalised. This would make it a compile >>> time operation, or one hit for each symbol you are exporting. I don't >>> know enough of the Perl internals as to why this would have to be on >>> access rather than insert. >> If the string "whatever" used to ask for the symbol, as in >> >> use Foo "whatever"; >> >> is not normalised, but *Foo::whatever is stored in the *Foo:: stash >> normalised, then the symbol lookup that happens at run time every time >> the symbol is exported will at some point have to normalise the name >> provided by the caller. > Just FYI, This already happens for utf8 keys in hashes. > > During fetch or store any utf8 key will trigger a downgrade attempt. > > During store, if the downgrade is successful then the key will be > marked as "was-utf8", so that later when it is fetched it will be > upgraded. If it is not successful then the key will be looked up by > its utf8 byte sequence. > > Combining characters of course are not "properly" downgraded. > > Anyway, the consequence of this is that unicode hash lookups are much > slower than they could be. > > Yves > > OK so let me see If I've got this straight. Hash lookups a done on bytes, because by the time the lookup is done the character semantics have been removed. So a potentially really yucky solution would be to special case the stash at that point and normalise the lookup string prier to the downgrade. Ugg I don't like it so I went looking for other languages to see what they do. C# says identifiers should be in NFC, Python performs a NFKC which is Compatibility Decomposition, followed by Canonical Composition. Which is really yucky in my opinion as it brakes up ligatures and makes things like the Angstrom match A ring. After this I gave up.Thread Previous | Thread Next