On Mon, 23 Jul 2001 13:43:30 -0500 Jarkko Hietaniemi <jhi@iki.fi> wrote: > Darn. Got me there, I am the one always warning people about the fact > that Unicode is not 16 bit anymore :-) > > I think we should solve this somehow differently, different, I don't > want to introduce a new huge-ish file (that is just a differently sorted > version of an existing file) to just to do the binary search. I think the searching method doesn't matter, :-) so long as it is appropriate and also able to handle CJK Unified Ideographs and Hangul syllables. BTW, Hangul syllables must be decomposed canonically, mustn't it? cf. DerivedDecompositionType-3.1.0.txt in Unicode 3.1 30FE ; canonical # Lm KATAKANA VOICED ITERATION MARK AC00..D7A3 ; canonical # Lo [11172] HANGUL SYLLABLE GA ..HANGUL SYLLABLE HIH F900..FA0D ; canonical # Lo [270] CJK COMPATIBILITY IDEOGRAPH-F900 ..CJK COMPATIBILITY IDEOGRAPH-FA0D but they are not included in lib/unicode/IsDecoCanon.pl. and why does lib/unicode/IsCn.pl comprise no characters? (see DerivedGeneralCategory-3.1.0.txt) For example, like this? # 0x0590 is in the Hebrew block but unused. -ok($charinfo->{category}, undef); +ok($charinfo->{category}, 'Cn'); regards, SADAHIRO Tomoyuki E-mail: bqw10602@nifty.comThread Previous