develooper Front page | perl.perl5.porters | Postings from February 2006

[perl #38619] Bug in lc and uc (interaction between UTF-8, substr, and lc/uc)

Thread Next
perl @ benizi . com
February 23, 2006 01:08
[perl #38619] Bug in lc and uc (interaction between UTF-8, substr, and lc/uc)
Message ID:
# New Ticket Created by 
# Please include the string:  [perl #38619]
# in the subject line of all future correspondence about this issue. 
# <URL: >

This is a bug report for perl from,
generated with the help of perlbug 1.35 running under perl v5.8.7.

[Please enter your report here]

Problem with lc/uc interacting with substr and _utf8_on.

Second substr(lc($var),0) on the same _utf8_on'ed $var is the wrong
length, and, in preliminary results, seems to be limited to the same length as
the first substr(lc($var), 0). Adding further iterations leads to further
weirdness. Test program below can be called as:

perl [test-string]
Test string will be split on /:/, defaults to 'a:bc'.

For each string in the split:
  _utf8_on, and print string <TAB> substr(lc(string), 0)

Output should be:
  string1 <TAB> string1
  string2 <TAB> string2

Actual output is:
  string1 <TAB> string1
  string2 <TAB> string3
(where string3 is the first length(string1) characters of string2)

# sample program demonstrating problem
$ cat
#!/usr/bin/perl -l
use strict;
use warnings;
use Encode qw/_utf8_on/;
for (split /:/, shift||'a:bc') {
 	print "$_\t", substr(lc($_), 0);

# expected results
$ cat expected_output
a	a
bc	bc

# actual results
$ perl
a	a
bc	b

# golfed test case (should produce 'abc', not 'ab')
$ perl -MEncode=_utf8_on -e '_utf8_on($_),print substr lc,0 for qw<a bc>,$/'

Additional oddness/data:
Affected versions: >=5.8.1
Confirmed unaffected: linux-i686 5.8.0, solaris 5.8.0

Affected functions: only lc/uc. (not ucfirst/lcfirst). Only in substr(lc(),0)
order. (i.e. lc(substr($_, 0)) is not affected.)

[Please do not change anything below this line]
Site configuration information for perl v5.8.7:

Configured by Gentoo at Sat Feb  4 23:34:18 EST 2006.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
     osname=linux, osvers=2.6.11-gentoo-r6, archname=i686-linux
     uname='linux elation 2.6.11-gentoo-r6 #4 thu may 12 16:36:25 edt 2005 i686 intel(r) pentium(r) 4 cpu 3.00ghz genuineintel gnulinux '
     config_args='-des -Darchname=i686-linux -Dcccdlflags=-fPIC -Dccdlflags=-rdynamic -Dcc=i686-pc-linux-gnu-gcc -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr -Dlocincpth=  -Doptimize=-O2 -march=pentium4 -fomit-frame-pointer -Duselargefiles -Dd_semctl_semun -Dscriptdir=/usr/bin -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3pm -Dinc_version_list=5.8.0 5.8.0/i686-linux 5.8.2 5.8.2/i686-linux 5.8.4 5.8.4/i686-linux 5.8.5 5.8.5/i686-linux 5.8.6 5.8.6/i686-linux  -Dcf_by=Gentoo -Ud_csh -Di_ndbm -Di_gdbm -Di_db'
     hint=recommended, useposix=true, d_sigaction=define
     usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
     use64bitint=undef use64bitall=undef uselongdouble=undef
     usemymalloc=n, bincompat5005=undef
     cc='i686-pc-linux-gnu-gcc', ccflags ='-fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
     optimize='-O2 -march=pentium4 -fomit-frame-pointer',
     cppflags='-fno-strict-aliasing -pipe'
     ccversion='', gccversion='3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)', gccosandvers=''
     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
     alignbytes=4, prototype=define
   Linker and Libraries:
     ld='i686-pc-linux-gnu-gcc', ldflags =' -L/usr/local/lib'
     libpth=/usr/local/lib /lib /usr/lib
     libs=-lpthread -lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
     perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
     libc=/lib/, so=so, useshrplib=false, libperl=libperl.a
   Dynamic Linking:
     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
     cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:

@INC for perl v5.8.7:

Environment for perl v5.8.7:
     LANG (unset)
     LANGUAGE (unset)
     LD_LIBRARY_PATH (unset)
     LOGDIR (unset)
     PERL_BADLANG (unset)

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About