develooper Front page | perl.perl5.porters | Postings from December 2012

[perl #116148] Pattern utf8ness sticks around globally

Thread Next
From:
Anders Melchiorsen
Date:
December 22, 2012 20:23
Subject:
[perl #116148] Pattern utf8ness sticks around globally
Message ID:
rt-3.6.HEAD-17500-1355989711-1679.116148-75-0@perl.org
# New Ticket Created by  Anders Melchiorsen 
# Please include the string:  [perl #116148]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=116148 >


On Wed, Dec 19, 2012 at 10:43:04PM +0100, Anders Melchiorsen wrote:

 >>> use feature 'unicode_strings';
 >>>
 >>> my $x = "\x{263a}";
 >>> $x =~ /$x/;
 >>>
 >>> my $text = "Perl";
 >>> die if $text !~ /P.*$/i;
 >>>
 >>> The program does nothing (as expected) in perl 5.14.2, but dies in
 >>> perl 5.16.2.


On 12/19/2012 04:02 PM, Dave Mitchell wrote:

 >> It bisects to
 >>
 >> commit 77a6d8568e288ad300ad7f0805946559b4ec28d1
 >> Author: Karl Williamson <public@khwilliamson.com>
 >> Date:   Sun Oct 16 12:47:21 2011 -0600
 >>
 >>      regexec.c: Less work in /i matching


On 20-12-2012 06:07, Karl Williamson wrote:

 > I looked at this a little.  The bug isn't from that commit, which is just
 > exposing an underlying bug.  The problem is that for the second match,
 > UTF_TARGET in regexec.c is wrongly evaluating to TRUE.  It is
 >
 > #define UTF_PATTERN ((PL_reg_flags & RF_utf8) != 0)
 >
 > so that means that PL_reg_flags is set wrong.  It turns out it is 
wrong at
 > the beginning of_re_intuit_start().  Somehow the utf8ness of the first
 > pattern is getting passed in this global as if it applied to the 2nd 
pattern.
 > Grepping the project's source indicates that the only setting of this 
global
 > is done in regexec.c.
 >
 > This looks to me like a global that shouldn't be, like PL_regsize, 
which Dave
 > just removed.  I didn't see anyplace where it gets initialized, which 
would
 > indicate that once the program sees a UTF-8 pattern, it thinks all 
patterns
 > are UTF-8.  That bug seems unlikely to have escaped detection until 
now, so
 > I don't understand.
 >
 > This is a portion of the regex handling that I know nothing about. 
Perhaps
 > this will help someone who does know this logic better to easily 
figure out
 > the cause and fix.


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About