John Imrie wrote: > On 07/30/10 21:17, karl williamson wrote: >> The problem is that we have to look-up in both directions. viacode() >> takes a code point number and returns the official Unicode name. We >> want that official name to have the correct spaces and hyphens. We >> don't want it to be "ZEROWIDTHSPACE", for example. The only >> reasonable way to do this is to have the official name stored >> correctly. That means we have to have a table with all the correct >> official names. There's no getting around that. >> > How about storing the loose matching name against the code point instead > of the official name. This gives loose matching via \N{}. Then making > viacode() a two step process. First find the lose name and then use that > as a key to a lookup for the official name. > I don't think I follow this. > As has been said elsewhere viacode() is not used that oftern and the > code to official name could be cashed once used. As a number of the > names can be programatically reconstituted from the loose name, ie CJK > COMPATIBILITY IDEOGRAPH-2F801, calls for these could be intercepted and > memory saved by not having them in the table. > Maybe you're saying we have two tables but save space by not having the programmatically determinable names in those tables. But I've already removed all the programmatically determinable names from the tables. The statistics I gave are for these pared down tables. They're still huge. > John >Thread Previous | Thread Next