develooper Front page | perl.unicode | Postings from June 2011

Need: list of Unicode characters that have canonical decompositions.

Thread Next
From:
BobH
Date:
June 27, 2011 07:27
Subject:
Need: list of Unicode characters that have canonical decompositions.
Message ID:
20110627142659.11628.qmail@lists-nntp.develooper.com
A project I'm working on needs to build a list of all Unicode characters 
that have canonical decompositions. The most efficient ways I can think 
of to get such a list are from unicore/Decomposition.pl or by scanning 
unicore/UnicodeData.txt. However:

Re unicore/Decomposition.pl, the header of this says:

> # !!!!!!!   INTERNAL PERL USE ONLY   !!!!!!!
> # This file is for internal use by the Perl program only.  The format and even
> # the name or existence of this file are subject to change without notice.
> # Don't use it directly.

Re unicore/UnicodeData.txt, I've recently posted a version of my module 
that uses unicore/UnicodeData.txt to CPAN, and from Perl 5.14 testers 
I've received only failure notices which indicate that the file cannot 
be found :-(

Unicode::UCD can tell me if a specific character has a decomposition, 
but can't give me a list of characters that have decompositions.

Any suggestions would be appreciated.

Bob

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About