develooper Front page | perl.perl5.porters | Postings from August 2008

[perl #58428] Unicode::UCD::charinfo() does not work on 21 Han codepoints

From:
karl williamson
Date:
August 29, 2008 05:28
Subject:
[perl #58428] Unicode::UCD::charinfo() does not work on 21 Han codepoints
Message ID:
rt-3.6.HEAD-29762-1219941104-382.58428-75-0@perl.org
# New Ticket Created by  karl williamson 
# Please include the string:  [perl #58428]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=58428 >


This is a bug report for perl from corporate@khwilliamson.com,
generated with the help of perlbug 1.35 running under perl v5.8.8.


-----------------------------------------------------------------
charinfo() returns an undefined pointer when called with any of the 21 
CJK Ideographs between U+9FA6 and U+9FBA.

use Unicode::UCD 'charinfo';
print ((defined charinfo(0x9FB0)) ? "defined" : "undefined");
print "\n";

will print 'undefined', even though this is a valid character in Unicode 
4.1.

This is because UCD.pm has the upper bound for this range hard-coded 
into it (at line 183 in version 0.25) and the bound has changed with 
more recent versions of the Unicode standard.  (The bound has changed 
again for Unicode 5.1, so even more codepoints will work incorrectly.)

Changing the line at 183 to:
[ 0x4E00,   0x9FBB,   \&han_charname,   undef  ],

causes charinfo() to work for Unicode up through version 5.0, but not 
for 5.1.

Note that the correct upper bound for the current Unicode version is 
derivable from UnicodeData.txt, a file that charinfo() already reads. 
There are entries in that file for the first and last characters of the 
ranges that this module hard-codes.  Here are the entries in that file 
for this range:

4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
9FBB;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;
-----------------------------------------------------------------
---
Flags:
     category=library
     severity=low
---
Site configuration information for perl v5.8.8:

Configured by Debian Project at Tue Nov 27 10:56:10 GMT 2007.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
   Platform:
     osname=linux, osvers=2.6.15.7, archname=i486-linux-gnu-thread-multi
     uname='linux palmer 2.6.15.7 #1 smp thu sep 7 19:42:20 utc 2006 
i686 gnulinux '
     config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN 
-Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr 
-Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 
-Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local 
-Dsitelib=/usr/local/share/perl/5.8.8 
-Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 
-Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 
-Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl 
-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio 
-Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
     hint=recommended, useposix=true, d_sigaction=define
     usethreads=define use5005threads=undef useithreads=define 
usemultiplicity=define
     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
     use64bitint=undef use64bitall=undef uselongdouble=undef
     usemymalloc=n, bincompat5005=undef
   Compiler:
     cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS 
-DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include 
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
     optimize='-O2',
     cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN 
-fno-strict-aliasing -pipe -I/usr/local/include'
     ccversion='', gccversion='4.2.3 20071123 (prerelease) (Ubuntu 
4.2.2-3ubuntu4)', gccosandvers=''
     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
     alignbytes=4, prototype=define
   Linker and Libraries:
     ld='cc', ldflags =' -L/usr/local/lib'
     libpth=/usr/local/lib /lib /usr/lib
     libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
     perllibs=-ldl -lm -lpthread -lc -lcrypt
     libc=/lib/libc-2.6.1.so, so=so, useshrplib=true, 
libperl=libperl.so.5.8.8
     gnulibc_version='2.6.1'
   Dynamic Linking:
     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
     cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:


---
@INC for perl v5.8.8:
     /etc/perl
     /usr/local/lib/perl/5.8.8
     /usr/local/share/perl/5.8.8
     /usr/lib/perl5
     /usr/share/perl5
     /usr/lib/perl/5.8
     /usr/share/perl/5.8
     /usr/local/lib/site_perl
     .

---
Environment for perl v5.8.8:
     HOME=/home/khw
     LANG=en_US.UTF-8
     LANGUAGE (unset)
     LD_LIBRARY_PATH (unset)
     LOGDIR (unset)
 
PATH=/home/khw/bin:/home/khw/print/bin:/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/usr/games:/home/khw/cxoffice/bin
     PERL_BADLANG (unset)
     SHELL=/bin/ksh




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About