develooper Front page | perl.perl5.porters | Postings from July 2001

Re: [PATCH @11446] UnicodeCD::charinfo

Thread Previous
July 25, 2001 08:57
Re: [PATCH @11446] UnicodeCD::charinfo
Message ID:
On Mon, 23 Jul 2001 13:43:30 -0500
Jarkko Hietaniemi <> wrote:
> Darn.  Got me there, I am the one always warning people about the fact
> that Unicode is not 16 bit anymore :-)
> I think we should solve this somehow differently, different, I don't
> want to introduce a new huge-ish file (that is just a differently sorted
> version of an existing file) to just to do the binary search.

I think the searching method doesn't matter, :-)
so long as it is appropriate and also able to handle
CJK Unified Ideographs and Hangul syllables.

BTW, Hangul syllables must be decomposed canonically, mustn't it?

cf. DerivedDecompositionType-3.1.0.txt in Unicode 3.1

  30FE        ; canonical # Lm       KATAKANA VOICED ITERATION MARK
  AC00..D7A3  ; canonical # Lo [11172] HANGUL SYLLABLE GA
                                     ..HANGUL SYLLABLE HIH
  F900..FA0D  ; canonical # Lo [270] CJK COMPATIBILITY IDEOGRAPH-F900
                                   ..CJK COMPATIBILITY IDEOGRAPH-FA0D

but they are not included in lib/unicode/

and why does lib/unicode/ comprise no characters?
 (see DerivedGeneralCategory-3.1.0.txt)

For example, like this?
# 0x0590 is in the Hebrew block but unused.
-ok($charinfo->{category},      undef);
+ok($charinfo->{category},      'Cn');


Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About