develooper Front page | perl.perl5.porters | Postings from December 2012

[perl #116148] Pattern utf8ness sticks around globally

Thread Next
Anders Melchiorsen
December 22, 2012 20:23
[perl #116148] Pattern utf8ness sticks around globally
Message ID:
# New Ticket Created by  Anders Melchiorsen 
# Please include the string:  [perl #116148]
# in the subject line of all future correspondence about this issue. 
# <URL: >

On Wed, Dec 19, 2012 at 10:43:04PM +0100, Anders Melchiorsen wrote:

 >>> use feature 'unicode_strings';
 >>> my $x = "\x{263a}";
 >>> $x =~ /$x/;
 >>> my $text = "Perl";
 >>> die if $text !~ /P.*$/i;
 >>> The program does nothing (as expected) in perl 5.14.2, but dies in
 >>> perl 5.16.2.

On 12/19/2012 04:02 PM, Dave Mitchell wrote:

 >> It bisects to
 >> commit 77a6d8568e288ad300ad7f0805946559b4ec28d1
 >> Author: Karl Williamson <>
 >> Date:   Sun Oct 16 12:47:21 2011 -0600
 >>      regexec.c: Less work in /i matching

On 20-12-2012 06:07, Karl Williamson wrote:

 > I looked at this a little.  The bug isn't from that commit, which is just
 > exposing an underlying bug.  The problem is that for the second match,
 > UTF_TARGET in regexec.c is wrongly evaluating to TRUE.  It is
 > #define UTF_PATTERN ((PL_reg_flags & RF_utf8) != 0)
 > so that means that PL_reg_flags is set wrong.  It turns out it is 
wrong at
 > the beginning of_re_intuit_start().  Somehow the utf8ness of the first
 > pattern is getting passed in this global as if it applied to the 2nd 
 > Grepping the project's source indicates that the only setting of this 
 > is done in regexec.c.
 > This looks to me like a global that shouldn't be, like PL_regsize, 
which Dave
 > just removed.  I didn't see anyplace where it gets initialized, which 
 > indicate that once the program sees a UTF-8 pattern, it thinks all 
 > are UTF-8.  That bug seems unlikely to have escaped detection until 
now, so
 > I don't understand.
 > This is a portion of the regex handling that I know nothing about. 
 > this will help someone who does know this logic better to easily 
figure out
 > the cause and fix.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About