develooper Front page | perl.perl5.porters | Postings from January 2012

[perl #108638] Re: "the Unicode bug", reversed?

Thread Previous
From:
karl williamson
Date:
January 19, 2012 12:34
Subject:
[perl #108638] Re: "the Unicode bug", reversed?
Message ID:
rt-3.6.HEAD-14510-1327005239-64.108638-75-0@perl.org
# New Ticket Created by  karl williamson 
# Please include the string:  [perl #108638]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=108638 >


On 09/06/2011 09:28 AM, Tom Christiansen wrote:
> Summary: If you use -E, matches fail that work fine under -e.  This is
>           in some sense the opposite of the Unicode bug, which normally
> 	 works the other way around.
>
> Matthew Barnett, who is implementing full casefolding in Python,
> initially reported to me these Perl bugs:
>
>      However, these match:
>
> 	 "\N{LATIN SMALL LETTER SHARP S}" =~ /ss/i
> 	 "\N{LATIN SMALL LIGATURE LONG S T}" =~ /st/i
> 	 "\N{LATIN SMALL LIGATURE ST}" =~ /st/i
> 	 "\N{LATIN SMALL LETTER SHARP S}t" =~ /sst/i
>
>      but these don't match:
>
> 	 "s\N{LATIN SMALL LIGATURE LONG S T}" =~ /sst/i
> 	 "s\N{LATIN SMALL LIGATURE ST}" =~ /sst/i
>
>      I think what might be happening is that it isn't handling the
>      possibility of overlapping full case-folding.
>
>      When it sees "sst" in the regex it identifies "ss" as a possible result
>      of full case-folding and so adds the unfolded alternative:
>
> 	 ss =>  ss|\N{LATIN SMALL LETTER SHARP S}
>
>      but it then doesn't identify "st" as another possible result of full
>      case-folding, so it doesn't add the unfolded alternative (either of
>      them, in fact):
>
> 	 st =>  st|\N{LATIN SMALL LIGATURE ST}
>
>      It should be doing:
>
> 	sst =>  sst|\N{LATIN SMALL LETTER SHARP S}t|s\N{LATIN SMALL LIGATURE ST}
>
>      (Again, I'm ignoring the other alternative.)
>
> And it is indeed true that those two test cases fail, under both 5.14 and blead:
>
>      This is perl 5, version 14, subversion 0 (v5.14.0) built for darwin-2level
>
>      This is perl 5, version 15, subversion 2 (v5.15.2-264-g87e4a53) built for darwin-2level
>
> As shown here:
>
>      % perl -Mcharnames=:full -lE 'print "s\N{LATIN SMALL LIGATURE LONG S T}" =~ /sst/i ? "Pass" : "Fail"'
>      Fail
>      % perl -Mcharnames=:full -lE 'print "s\N{LATIN SMALL LIGATURE ST}"       =~ /sst/i ? "Pass" : "Fail"'
>      Fail
>
> However, merely change the -E to a -e, suddenly they work!
>
>      % perl -Mcharnames=:full -le 'print "s\N{LATIN SMALL LIGATURE LONG S T}" =~ /sst/i ? "Pass" : "Fail"'
>      Pass
>      % perl -Mcharnames=:full -le 'print "s\N{LATIN SMALL LIGATURE ST}"       =~ /sst/i ? "Pass" : "Fail"'
>      Pass
>
> So it looks like this is some reverse Unicode bug.  Very strange.
>
> For the record, Ruby does get these right:
>
>      % ruby 'print "s\uFB05"   =~ /sst/i ? "Pass" : "Fail"'
>      Pass
>      % ruby 'print "s\uFB06"   =~ /sst/i ? "Pass" : "Fail"'
>      Pass
>
> Where that is:
>
>      % ruby -v
>      ruby 1.9.2p0 (2010-08-18 revision 29036) [i386-darwin9.8.0]
>
> Here are other, probably related issues:
>
>      % perl  -lE 'print       "\x{FB05}" =~ /st/i ? "Pass" : "Fail"'
>      Pass
>      % perl  -lE 'print "\x{DF}\x{FB05}" =~ /st/i ? "Pass" : "Fail"'
>      Fail
>      % blead -lE 'print "\x{DF}\x{FB05}" =~ /st/i ? "Pass" : "Fail"'
>      Fail
>
> However, unlike the early attempts, *those* do *not* suddenly pass if
> you use -e instead of -E:
>
>      % perl  -le 'print "\x{DF}\x{FB05}" =~ /st/i ? "Pass" : "Fail"'
>      Fail
>      % blead -le 'print "\x{DF}\x{FB05}" =~ /st/i ? "Pass" : "Fail"'
>      Fail
>
> See; it still fails.  Very strange.  They work fine in Ruby:
>
>      % ruby -le 'print       "\uFB05" =~ /st/i ? "Pass" : "Fail"'
>      Pass
>      % ruby -le 'print "\u00DF\uFB05" =~ /st/i ? "Pass" : "Fail"'
>      Pass
>
> Like Perl, Ruby does *not* do partial matches of full casefolds
> (I don't think the idea makes sense), so it's not like it's going
> totally overboard with full casefolding:
>
>      % perl -lE 'print "\x{DF}\x{FB05}" =~ /ssst/i ? "Pass" : "Fail"'
>      Pass
>      % ruby -le 'print "\u00DF\uFB05" =~ /ssst/i ? "Pass" : "Fail"'
>      Pass
>
>      % perl -lE 'print "\x{DF}\x{FB05}" =~ /sst/i ? "Pass" : "Fail"'
>      Fail
>      % ruby -le 'print "\u00DF\uFB05" =~ /sst/i ? "Pass" : "Fail"'
>      Fail
>
> Which is as expected.  The others aren't.
>
> --tom


I believe that these are all now fixed in blead


Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About