On 07/30/10 21:17, karl williamson wrote: > The problem is that we have to look-up in both directions. viacode() > takes a code point number and returns the official Unicode name. We > want that official name to have the correct spaces and hyphens. We > don't want it to be "ZEROWIDTHSPACE", for example. The only > reasonable way to do this is to have the official name stored > correctly. That means we have to have a table with all the correct > official names. There's no getting around that. > How about storing the loose matching name against the code point instead of the official name. This gives loose matching via \N{}. Then making viacode() a two step process. First find the lose name and then use that as a key to a lookup for the official name. As has been said elsewhere viacode() is not used that oftern and the code to official name could be cashed once used. As a number of the names can be programatically reconstituted from the loose name, ie CJK COMPATIBILITY IDEOGRAPH-2F801, calls for these could be intercepted and memory saved by not having them in the table. JohnThread Previous | Thread Next