develooper Front page | perl.perl5.porters | Postings from December 2005

[perl #37999] lc() + Latin-1 chars is failing erratically

Thread Next
From:
Daniel Richard G .
Date:
December 21, 2005 02:11
Subject:
[perl #37999] lc() + Latin-1 chars is failing erratically
# New Ticket Created by  Daniel Richard G. 
# Please include the string:  [perl #37999]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=37999 >


I have a script that is processing a list of words in Latin-1 encoding.
It is taking one word from each line, lowercasing it, and writing it
out.

I had found that certain accented letters in a word were not being
lowercased by lc(), even though other (ASCII) letters in the same word
were. At first I thought that an encoding issue was to blame, but after
hacking down a minimal bug case, I found the problem:

If I chomp() the string before lc()ing it, everything works fine. If I
chop() it first---even though the resulting string is identical---the case
transformation fails. (Same result if I do neither, retaining the trailing
"\n".) Also, if I don't read the input from a file, but merely place it
inline in the program, everything works (with chomp() and chop() alike).

I am attaching both the test script and input file; please review the
comments in the script. If the script dies with "aaaaaaaack!" then the bug
is present.

This bug has been reproduced with Perl 5.8.x built from development source.
Locale settings do not appear to affect it (happens with LANG=C, etc.).


--Daniel


-- 
NAME   = Daniel Richard G.       ##  Remember, skunks       _\|/_  meef?
EMAIL1 = skunk@iskunk.org        ##  don't smell bad---    (/o|o\) /
EMAIL2 = skunk@alum.mit.edu      ##  it's the people who   < (^),>
WWW    = http://www.******.org/  ##  annoy them that do!    /   \
--
(****** = site not yet online)


Thread Next


Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About