Front page | perl.perl5.porters |
Postings from August 2008
[perl #58428] Unicode::UCD::charinfo() does not work on 21 Han codepoints
From:
karl williamson
Date:
August 29, 2008 05:28
Subject:
[perl #58428] Unicode::UCD::charinfo() does not work on 21 Han codepoints
Message ID:
rt-3.6.HEAD-29762-1219941104-382.58428-75-0@perl.org
# New Ticket Created by karl williamson
# Please include the string: [perl #58428]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=58428 >
This is a bug report for perl from corporate@khwilliamson.com,
generated with the help of perlbug 1.35 running under perl v5.8.8.
-----------------------------------------------------------------
charinfo() returns an undefined pointer when called with any of the 21
CJK Ideographs between U+9FA6 and U+9FBA.
use Unicode::UCD 'charinfo';
print ((defined charinfo(0x9FB0)) ? "defined" : "undefined");
print "\n";
will print 'undefined', even though this is a valid character in Unicode
4.1.
This is because UCD.pm has the upper bound for this range hard-coded
into it (at line 183 in version 0.25) and the bound has changed with
more recent versions of the Unicode standard. (The bound has changed
again for Unicode 5.1, so even more codepoints will work incorrectly.)
Changing the line at 183 to:
[ 0x4E00, 0x9FBB, \&han_charname, undef ],
causes charinfo() to work for Unicode up through version 5.0, but not
for 5.1.
Note that the correct upper bound for the current Unicode version is
derivable from UnicodeData.txt, a file that charinfo() already reads.
There are entries in that file for the first and last characters of the
ranges that this module hard-codes. Here are the entries in that file
for this range:
4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
9FBB;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;
-----------------------------------------------------------------
---
Flags:
category=library
severity=low
---
Site configuration information for perl v5.8.8:
Configured by Debian Project at Tue Nov 27 10:56:10 GMT 2007.
Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
Platform:
osname=linux, osvers=2.6.15.7, archname=i486-linux-gnu-thread-multi
uname='linux palmer 2.6.15.7 #1 smp thu sep 7 19:42:20 utc 2006
i686 gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN
-Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr
-Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5
-Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local
-Dsitelib=/usr/local/share/perl/5.8.8
-Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1
-Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1
-Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl
-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio
-Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN
-fno-strict-aliasing -pipe -I/usr/local/include'
ccversion='', gccversion='4.2.3 20071123 (prerelease) (Ubuntu
4.2.2-3ubuntu4)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
perllibs=-ldl -lm -lpthread -lc -lcrypt
libc=/lib/libc-2.6.1.so, so=so, useshrplib=true,
libperl=libperl.so.5.8.8
gnulibc_version='2.6.1'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
Locally applied patches:
---
@INC for perl v5.8.8:
/etc/perl
/usr/local/lib/perl/5.8.8
/usr/local/share/perl/5.8.8
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.8
/usr/share/perl/5.8
/usr/local/lib/site_perl
.
---
Environment for perl v5.8.8:
HOME=/home/khw
LANG=en_US.UTF-8
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/khw/bin:/home/khw/print/bin:/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/usr/games:/home/khw/cxoffice/bin
PERL_BADLANG (unset)
SHELL=/bin/ksh
-
[perl #58428] Unicode::UCD::charinfo() does not work on 21 Han codepoints
by karl williamson