develooper Front page | perl.perl5.porters | Postings from February 2006

[perl #38619] Bug in lc and uc (interaction between UTF-8, substr, and lc/uc)

Thread Next
From:
perl @ benizi . com
Date:
February 23, 2006 01:08
Subject:
[perl #38619] Bug in lc and uc (interaction between UTF-8, substr, and lc/uc)
Message ID:
rt-3.0.11-38619-130499.13.7246577431753@perl.org
# New Ticket Created by  perl@benizi.com 
# Please include the string:  [perl #38619]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=38619 >


This is a bug report for perl from perl@benizi.com,
generated with the help of perlbug 1.35 running under perl v5.8.7.


-----------------------------------------------------------------
[Please enter your report here]

Problem with lc/uc interacting with substr and _utf8_on.

Second substr(lc($var),0) on the same _utf8_on'ed $var is the wrong
length, and, in preliminary results, seems to be limited to the same length as
the first substr(lc($var), 0). Adding further iterations leads to further
weirdness. Test program below can be called as:

perl bug.pl [test-string]
Test string will be split on /:/, defaults to 'a:bc'.

For each string in the split:
  _utf8_on, and print string <TAB> substr(lc(string), 0)

Output should be:
  string1 <TAB> string1
  string2 <TAB> string2
  ...

Actual output is:
  string1 <TAB> string1
  string2 <TAB> string3
  ...
(where string3 is the first length(string1) characters of string2)

# sample program demonstrating problem
$ cat bug.pl
#!/usr/bin/perl -l
use strict;
use warnings;
use Encode qw/_utf8_on/;
for (split /:/, shift||'a:bc') {
 	_utf8_on($_);
 	print "$_\t", substr(lc($_), 0);
}

# expected results
$ cat expected_output
a	a
bc	bc

# actual results
$ perl bug.pl
a	a
bc	b

# golfed test case (should produce 'abc', not 'ab')
$ perl -MEncode=_utf8_on -e '_utf8_on($_),print substr lc,0 for qw<a bc>,$/'
ab


Additional oddness/data:
Affected versions: >=5.8.1
Confirmed unaffected: linux-i686 5.8.0, solaris 5.8.0

Affected functions: only lc/uc. (not ucfirst/lcfirst). Only in substr(lc(),0)
order. (i.e. lc(substr($_, 0)) is not affected.)

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
     category=core
     severity=low
---
Site configuration information for perl v5.8.7:

Configured by Gentoo at Sat Feb  4 23:34:18 EST 2006.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
   Platform:
     osname=linux, osvers=2.6.11-gentoo-r6, archname=i686-linux
     uname='linux elation 2.6.11-gentoo-r6 #4 thu may 12 16:36:25 edt 2005 i686 intel(r) pentium(r) 4 cpu 3.00ghz genuineintel gnulinux '
     config_args='-des -Darchname=i686-linux -Dcccdlflags=-fPIC -Dccdlflags=-rdynamic -Dcc=i686-pc-linux-gnu-gcc -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr -Dlocincpth=  -Doptimize=-O2 -march=pentium4 -fomit-frame-pointer -Duselargefiles -Dd_semctl_semun -Dscriptdir=/usr/bin -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3pm -Dinc_version_list=5.8.0 5.8.0/i686-linux 5.8.2 5.8.2/i686-linux 5.8.4 5.8.4/i686-linux 5.8.5 5.8.5/i686-linux 5.8.6 5.8.6/i686-linux  -Dcf_by=Gentoo -Ud_csh -Di_ndbm -Di_gdbm -Di_db'
     hint=recommended, useposix=true, d_sigaction=define
     usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
     use64bitint=undef use64bitall=undef uselongdouble=undef
     usemymalloc=n, bincompat5005=undef
   Compiler:
     cc='i686-pc-linux-gnu-gcc', ccflags ='-fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
     optimize='-O2 -march=pentium4 -fomit-frame-pointer',
     cppflags='-fno-strict-aliasing -pipe'
     ccversion='', gccversion='3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)', gccosandvers=''
     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
     alignbytes=4, prototype=define
   Linker and Libraries:
     ld='i686-pc-linux-gnu-gcc', ldflags =' -L/usr/local/lib'
     libpth=/usr/local/lib /lib /usr/lib
     libs=-lpthread -lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
     perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
     libc=/lib/libc-2.3.5.so, so=so, useshrplib=false, libperl=libperl.a
     gnulibc_version='2.3.5'
   Dynamic Linking:
     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
     cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:


---
@INC for perl v5.8.7:
     /etc/perl
     /usr/lib/perl5/site_perl/5.8.7/i686-linux
     /usr/lib/perl5/site_perl/5.8.7
     /usr/lib/perl5/site_perl/5.8.5
     /usr/lib/perl5/site_perl/5.8.5/i686-linux
     /usr/lib/perl5/site_perl/5.8.6
     /usr/lib/perl5/site_perl/5.8.6/i686-linux
     /usr/lib/perl5/site_perl
     /usr/lib/perl5/vendor_perl/5.8.7/i686-linux
     /usr/lib/perl5/vendor_perl/5.8.7
     /usr/lib/perl5/vendor_perl/5.8.5
     /usr/lib/perl5/vendor_perl/5.8.5/i686-linux
     /usr/lib/perl5/vendor_perl/5.8.6
     /usr/lib/perl5/vendor_perl/5.8.6/i686-linux
     /usr/lib/perl5/vendor_perl
     /usr/lib/perl5/5.8.7/i686-linux
     /usr/lib/perl5/5.8.7
     /usr/local/lib/site_perl
     .

---
Environment for perl v5.8.7:
     HOME=/home/bhaskell
     LANG (unset)
     LANGUAGE (unset)
     LD_LIBRARY_PATH (unset)
     LOGDIR (unset)
     PATH=/home/bhaskell/bin:/home/bhaskell/wn/bin:/usr/kde/3.4/bin:/bin:/usr/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/3.4.4:/opt/ati/bin:/opt/ghc/bin:/opt/blackdown-jdk-1.4.2.02/bin:/opt/blackdown-jdk-1.4.2.02/jre/bin:/usr/qt/3/bin:/usr/kde/3.4/bin:/usr/kde/3.3/bin:/usr/games/bin:/var/qmail/bin:/usr/cogsci/bin:/people/bhaskell/bin
     PERL_BADLANG (unset)
     SHELL=/bin/zsh


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About