Front page | perl.perl5.porters |
Postings from August 2008
[perl #58430] Unicode::UCD::casefold() does not work as documented, nor prob as intended
From:
karl williamson
Date:
August 29, 2008 05:28
Subject:
[perl #58430] Unicode::UCD::casefold() does not work as documented, nor prob as intended
Message ID:
rt-3.6.HEAD-29762-1219946564-598.58430-75-0@perl.org
# New Ticket Created by karl williamson
# Please include the string: [perl #58430]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=58430 >
This is a bug report for perl from corporate@khwilliamson.com,
generated with the help of perlbug 1.35 running under perl v5.8.8.
-----------------------------------------------------------------
The documentation claims that the casefold function returns an 'I' for
the special case of dotless lowercase i mapping. But there is no code
in the function that does that. casefold() uses the file
CaseFoldting.txt from Unicode. The function is looking for an 'I' in
that file in the appropriate column, but the file uses a 'T' not an 'I'
for this purpose, so it never will be found.
But it is a good thing that this bug exists, for otherwise, it would
generally return the wrong thing for the folding of an upper case 'I'.
The problem is that the file contains multiple entries for several
characters. This is nowhere indicated in the function's documentation,
and I'm not sure that the programmer realized it, because it is not
clear to me what the proper behavior should be, except that the current
behavior isn't correct. Perhaps it should return hashes like casespec()
does to allow the caller to choose which folding to do, or have a second
parameter to indicate which type to return.
For example, with a capital I, there are two entries, the first for the
normal case where 'I' maps to 'i', and the 2nd for where it maps to a
dotless i. The function populates a hash, and whatever entry comes last
in the file overwrites any earlier hash value. Thus if the function
were written to look for the T (instead of the non-existent I), the very
special case of Turkish would override the more likely case in any of a
number of other languages.
There are a number of other cases in the file where there are different
mappings for the same character, and the function will always use just
one mapping, the last one found in the file.
This is contrary to what the documentation implies, and I doubt that it
is an adequate interface to the database.
-----------------------------------------------------------------
---
Flags:
category=library
severity=medium
---
Site configuration information for perl v5.8.8:
Configured by Debian Project at Tue Nov 27 10:56:10 GMT 2007.
Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
Platform:
osname=linux, osvers=2.6.15.7, archname=i486-linux-gnu-thread-multi
uname='linux palmer 2.6.15.7 #1 smp thu sep 7 19:42:20 utc 2006
i686 gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN
-Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr
-Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5
-Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local
-Dsitelib=/usr/local/share/perl/5.8.8
-Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1
-Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1
-Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl
-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio
-Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN
-fno-strict-aliasing -pipe -I/usr/local/include'
ccversion='', gccversion='4.2.3 20071123 (prerelease) (Ubuntu
4.2.2-3ubuntu4)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
perllibs=-ldl -lm -lpthread -lc -lcrypt
libc=/lib/libc-2.6.1.so, so=so, useshrplib=true,
libperl=libperl.so.5.8.8
gnulibc_version='2.6.1'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
Locally applied patches:
---
@INC for perl v5.8.8:
/etc/perl
/usr/local/lib/perl/5.8.8
/usr/local/share/perl/5.8.8
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.8
/usr/share/perl/5.8
/usr/local/lib/site_perl
.
---
Environment for perl v5.8.8:
HOME=/home/khw
LANG=en_US.UTF-8
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/khw/bin:/home/khw/print/bin:/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/usr/games:/home/khw/cxoffice/bin
PERL_BADLANG (unset)
SHELL=/bin/ksh
-
[perl #58430] Unicode::UCD::casefold() does not work as documented, nor prob as intended
by karl williamson