develooper Front page | perl.perl5.porters | Postings from June 2009

[perl #56690] Some bugs in Perl regexp (core Perl issues)

Thread Next
From:
Hugo van der Sanden via RT
Date:
June 26, 2009 08:16
Subject:
[perl #56690] Some bugs in Perl regexp (core Perl issues)
Message ID:
rt-3.6.HEAD-11910-1246020692-205.56690-15-0@perl.org
On Tue Jul 08 14:49:43 2008, abigail@abigail.be wrote:
> Here are some tests for this bug:
> 
> 
> 
> --- t/op/re_tests.orig	2008-04-11 14:20:20.000000000 +0200
> +++ t/op/re_tests	2008-07-08 18:43:39.000000000 +0200
> @@ -1344,4 +1344,7 @@
>  .*?(?:(\w)|(\w))x	abx	y	$1-$2	b-
>  
>  0{50}	000000000000000000000000000000000000000000000000000	y	-	-
> +# Bug #56690
> +^a?(?=b)b	ab	y	$&	ab
> +^a*(?=b)b	ab	y	$&	ab


This is caused by a failure of the start_class optimization in the case
of lookahead, as per the attached comment.

In more detail: at the point study_chunk() attempts to deal with the
start_class discovered for the lookahead chunk, we have
SCF_DO_STCLASS_OR set, and_withp has the starting value of ANYOF_EOS |
ANYOF_UNICODE_ALL, and data->start_class has [a] | ANYOF_EOS.

So given:
  start = ANYOF_EOS | ANYOF_UNICODE_ALL
  pre = [a] | ANYOF_EOS
  lookahead = [b]
  post = [b]
what we should be getting is:
  start_class = start & (pre | (lookahead & post))
      = start & (pre | [b])
      = start & [ab]
      = [ab]
but what we are getting is:
  start_class = start & ((pre & lookahead) | post)
      = start & (ANYOF_EOS | post)
      = start & [b]
      = [b]

In other words, we need to stack an alternation of ANDs and ORs to cope
with this situation, and we don't have a mechanism to do that except to
recurse into study_chunk() some more.

A simpler short-term fix is instead to throw up our hands in this
situation, and just nullify start_class. I'm not sure exactly how to do
that, but it seems the more likely to be achievable for 5.10.1.

Hugo

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About