develooper Front page | perl.perl5.porters | Postings from October 2011

Re: [perl #98546] "the Unicode bug", reversed?

Thread Previous | Thread Next
Karl Williamson
October 13, 2011 19:47
Re: [perl #98546] "the Unicode bug", reversed?
Message ID:
On 09/08/2011 07:16 PM, Tom Christiansen wrote:

> I'm sure this is the same bug, but I see that not all
> combinations of FB00 (ff) and FB01 (fi) work.
>      % perl -le 'print "\x{FB00}\x{FB01}" =~ /ff/i || 0'
>      1
>      % perl -le 'print "\x{FB01}\x{FB00}" =~ /ff/i || 0'
>      0
>      % perl -le 'print "\x{FB00}\x{FB01}" =~ /fi/i || 0'
>      0
>      % perl -le 'print "\x{FB01}\x{FB00}" =~ /fi/i || 0'
>      1

As I said in an earlier email, this is not the same bug as the ones 
involving the sharp SS.  And these are now fixed, leaving this 
particular symptom (I hope) only for the 3 "tricky" fold characters and 
their folds.  Those require significantly more work than this trivial 1 
line patch:

commit 7c1b9f38fcbfdb3a9e1766e02bcb991d1a5452d9
  Author: Karl Williamson <>
  Date:   Thu Oct 13 19:56:45 2011 -0600

      regexec.c: Fix "\x{FB01}\x{FB00}" =~ /ff/i

      Only the first character of the string was being checked when scanning
      for the beginning position of the pattern match.

      This was so wrong, it looks like it has to be a regression.  I
      experimented a little and did not find any.  I believe (but am not
      certain) that a multi-char fold has to be involved.  The the 
handling of
      these was so broken before 5.14 that there very well may not be a

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About