develooper Front page | perl.perl5.porters | Postings from April 2003

Re: [perl #21874] Perl bug involving $1 and lc()

Thread Previous
From:
Nicholas Clark
Date:
April 9, 2003 07:22
Subject:
Re: [perl #21874] Perl bug involving $1 and lc()
Message ID:
20030409151250.F1186@plum.flirble.org
On Tue, Apr 08, 2003 at 10:03:58AM -0700, Robert Spier wrote:
> Jon Rifkin (via RT) wrote
> > Note the second output line, where there should be a '12345' there is
> > instead a '0c0'.  This problem occurrs regardless of the input
> > data unless the are blank lines, and the error always occurs on
> > the second output line starting at the second column.
> 
> Weird--
> 
>    I see you're using RH8.  This looks like a UTF8 Locale issue.  RedHat 
> 8 changed the default encoding to utf8.
> 
>   I can easily replicate this on my RH7.3 box when explicitly setting LANG.
> 
>   echo 0123456789 0123456789 | LANG=en_US.utf8 perl -lne 'print lc($1) 
> while /(\d+)/g'
> 0123456789
> 00c06789
> 
> I believe this is smarter in perl5.8.1-to-be and 5.9.1-to-be.

I can replicate it with a UTF-8 locale with 5.8.0 on SuSE. I cannot
replicate it with a UTF-8 locale with 5.8.1-to-be and the flags to
enable the 5.8.0 locale behaviour. I think it's actually the B0B bug on
5.8.0

$ LANG=en_GB.utf8 perl5.8.0 -ne 'print lc($1),"\n" if /^(.*)$/' <~/test/Message
0123456789
00c06789
0123456789

$ LANG=en_GB.utf8 ./perl -C63 -Ilib -ne 'print lc($1),"\n" if /^(.*)$/' <~/test/Message 
0123456789
0123456789
0123456789

The work around for 5.8.0 is to change LANG from a UTF8 locale

Nicholas Clark

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About