On Tue Jul 08 14:49:43 2008, abigail@abigail.be wrote:
> Here are some tests for this bug:
>
>
>
> --- t/op/re_tests.orig 2008-04-11 14:20:20.000000000 +0200
> +++ t/op/re_tests 2008-07-08 18:43:39.000000000 +0200
> @@ -1344,4 +1344,7 @@
> .*?(?:(\w)|(\w))x abx y $1-$2 b-
>
> 0{50} 000000000000000000000000000000000000000000000000000 y - -
> +# Bug #56690
> +^a?(?=b)b ab y $& ab
> +^a*(?=b)b ab y $& ab
This is caused by a failure of the start_class optimization in the case
of lookahead, as per the attached comment.
In more detail: at the point study_chunk() attempts to deal with the
start_class discovered for the lookahead chunk, we have
SCF_DO_STCLASS_OR set, and_withp has the starting value of ANYOF_EOS |
ANYOF_UNICODE_ALL, and data->start_class has [a] | ANYOF_EOS.
So given:
start = ANYOF_EOS | ANYOF_UNICODE_ALL
pre = [a] | ANYOF_EOS
lookahead = [b]
post = [b]
what we should be getting is:
start_class = start & (pre | (lookahead & post))
= start & (pre | [b])
= start & [ab]
= [ab]
but what we are getting is:
start_class = start & ((pre & lookahead) | post)
= start & (ANYOF_EOS | post)
= start & [b]
= [b]
In other words, we need to stack an alternation of ANDs and ORs to cope
with this situation, and we don't have a mechanism to do that except to
recurse into study_chunk() some more.
A simpler short-term fix is instead to throw up our hands in this
situation, and just nullify start_class. I'm not sure exactly how to do
that, but it seems the more likely to be achievable for 5.10.1.
Hugo
Thread Next