On Tue, Apr 08, 2003 at 10:03:58AM -0700, Robert Spier wrote: > Jon Rifkin (via RT) wrote > > Note the second output line, where there should be a '12345' there is > > instead a '0c0'. This problem occurrs regardless of the input > > data unless the are blank lines, and the error always occurs on > > the second output line starting at the second column. > > Weird-- > > I see you're using RH8. This looks like a UTF8 Locale issue. RedHat > 8 changed the default encoding to utf8. > > I can easily replicate this on my RH7.3 box when explicitly setting LANG. > > echo 0123456789 0123456789 | LANG=en_US.utf8 perl -lne 'print lc($1) > while /(\d+)/g' > 0123456789 > 00c06789 > > I believe this is smarter in perl5.8.1-to-be and 5.9.1-to-be. I can replicate it with a UTF-8 locale with 5.8.0 on SuSE. I cannot replicate it with a UTF-8 locale with 5.8.1-to-be and the flags to enable the 5.8.0 locale behaviour. I think it's actually the B0B bug on 5.8.0 $ LANG=en_GB.utf8 perl5.8.0 -ne 'print lc($1),"\n" if /^(.*)$/' <~/test/Message 0123456789 00c06789 0123456789 $ LANG=en_GB.utf8 ./perl -C63 -Ilib -ne 'print lc($1),"\n" if /^(.*)$/' <~/test/Message 0123456789 0123456789 0123456789 The work around for 5.8.0 is to change LANG from a UTF8 locale Nicholas ClarkThread Previous