# New Ticket Created by Anders Melchiorsen
# Please include the string: [perl #116148]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=116148 >
On Wed, Dec 19, 2012 at 10:43:04PM +0100, Anders Melchiorsen wrote:
>>> use feature 'unicode_strings';
>>>
>>> my $x = "\x{263a}";
>>> $x =~ /$x/;
>>>
>>> my $text = "Perl";
>>> die if $text !~ /P.*$/i;
>>>
>>> The program does nothing (as expected) in perl 5.14.2, but dies in
>>> perl 5.16.2.
On 12/19/2012 04:02 PM, Dave Mitchell wrote:
>> It bisects to
>>
>> commit 77a6d8568e288ad300ad7f0805946559b4ec28d1
>> Author: Karl Williamson <public@khwilliamson.com>
>> Date: Sun Oct 16 12:47:21 2011 -0600
>>
>> regexec.c: Less work in /i matching
On 20-12-2012 06:07, Karl Williamson wrote:
> I looked at this a little. The bug isn't from that commit, which is just
> exposing an underlying bug. The problem is that for the second match,
> UTF_TARGET in regexec.c is wrongly evaluating to TRUE. It is
>
> #define UTF_PATTERN ((PL_reg_flags & RF_utf8) != 0)
>
> so that means that PL_reg_flags is set wrong. It turns out it is
wrong at
> the beginning of_re_intuit_start(). Somehow the utf8ness of the first
> pattern is getting passed in this global as if it applied to the 2nd
pattern.
> Grepping the project's source indicates that the only setting of this
global
> is done in regexec.c.
>
> This looks to me like a global that shouldn't be, like PL_regsize,
which Dave
> just removed. I didn't see anyplace where it gets initialized, which
would
> indicate that once the program sees a UTF-8 pattern, it thinks all
patterns
> are UTF-8. That bug seems unlikely to have escaped detection until
now, so
> I don't understand.
>
> This is a portion of the regex handling that I know nothing about.
Perhaps
> this will help someone who does know this logic better to easily
figure out
> the cause and fix.
Thread Next