develooper Front page | perl.perl5.porters | Postings from July 2001

Re: [PATCH @11446] UnicodeCD::charinfo

Thread Previous
From:
SADAHIRO Tomoyuki
Date:
July 25, 2001 08:57
Subject:
Re: [PATCH @11446] UnicodeCD::charinfo
Message ID:
20010726005355.5F12.BQW10602@nifty.com
On Mon, 23 Jul 2001 13:43:30 -0500
Jarkko Hietaniemi <jhi@iki.fi> wrote:
 
> Darn.  Got me there, I am the one always warning people about the fact
> that Unicode is not 16 bit anymore :-)
> 
> I think we should solve this somehow differently, different, I don't
> want to introduce a new huge-ish file (that is just a differently sorted
> version of an existing file) to just to do the binary search.

I think the searching method doesn't matter, :-)
so long as it is appropriate and also able to handle
CJK Unified Ideographs and Hangul syllables.

BTW, Hangul syllables must be decomposed canonically, mustn't it?

cf. DerivedDecompositionType-3.1.0.txt in Unicode 3.1

  30FE        ; canonical # Lm       KATAKANA VOICED ITERATION MARK
  AC00..D7A3  ; canonical # Lo [11172] HANGUL SYLLABLE GA
                                     ..HANGUL SYLLABLE HIH
  F900..FA0D  ; canonical # Lo [270] CJK COMPATIBILITY IDEOGRAPH-F900
                                   ..CJK COMPATIBILITY IDEOGRAPH-FA0D

but they are not included in lib/unicode/IsDecoCanon.pl.

and why does lib/unicode/IsCn.pl comprise no characters?
 (see DerivedGeneralCategory-3.1.0.txt)

For example, like this?
# 0x0590 is in the Hebrew block but unused.
-ok($charinfo->{category},      undef);
+ok($charinfo->{category},      'Cn');

regards,
SADAHIRO Tomoyuki
E-mail: bqw10602@nifty.com


Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About