develooper Front page | perl.perl5.porters | Postings from January 2012

Re: whither study()?

Thread Previous | Thread Next
Nicholas Clark
January 31, 2012 12:12
Re: whither study()?
Message ID:
On Tue, Jan 31, 2012 at 05:28:02PM +0100, Zsbán Ambrus wrote:
> On 1/31/12, demerphq <> wrote:
> >  I wonder if we will
> > get "bug" reports about peoples patterns starting to match correctly?
> > :-)
> Can happen, see this as an example:


It happens that that regression was introduced by this commit:

commit 07be1b83a6b2d24b492356181ddf70e1c7917ae3
Author: Yves Orton <>
Date:   Fri Jun 9 02:56:37 2006 +0200

    Re: [PATCH] Better version of the Aho-Corasick patch and lots of benchmarks.
    Message-ID: <>

    (with tweaks)

    p4raw-id: //depot/perl@28373

:100644 100644 edb4458878b701517f5ee1f25d043734cbab32ff 1c4fff1a071007ed3468c2ff7e8238764be44bc1 M      embed.fnc
:100644 100644 66c67a78c99aee0f33defadc03b46f0d351b6ff0 778baac9d76a37ada1c1fbf1184db5866b2ff122 M      embed.h
:040000 040000 ddafb9974caa023bb1839a1f337c4a7f37c80183 82b562d849361c6ed9f29bce621c64a7e6917b6a M      ext
:100644 100644 5eea726b99e30adb66db5e631805f0857b8dd2e1 da4a1531cfba65f3abed4d0fa2f6b9dff78944f6 M      proto.h
:100644 100644 97e0650c5dd5ea0bdf704ae2fa4f5845ce8a6b77 c99a0f874010da13e89b39482f490c085ebefeab M      regcomp.c
:100644 100644 553baea516f296c93f25baa8c34a6517e38aca32 7310d3c89ee793a2b90b46ed08bef34f3284eb7e M      regcomp.h
:100644 100644 5b8f2447a3b464b5fcd194baead69c2ab9196eaf ffe988898033cb12c0a8a851be89c8e9048c0403 M      regexec.c
:100644 100644 3653b86652c241782e8f4c929100cae0d2dabf98 7fd7e7b1836c831338c601f0b2a4b1d7bbb2b49e M      sv.c

and fixed by this one:

commit e7f38d0fe17e7a846c0ed55e71ebb120a336b887
Author: Yves Orton <>
Date:   Wed Nov 3 10:23:00 2010 +0100

    fix 68564: /g failure with zero-width patterns

    This is based on a patch by Father Chrysostomos <>

    The start class optimisation has two modes, "try every valid start
    position" (doevery) and "flip flop mode" (!doevery) where it trys
    only the first valid start position in a sequence.

    Consider /(\d+)X/ and the string "123456Y", now we know that if we fail
    to match X after matching "123456" then we will also fail to match after
    "23456" (assuming no evil tricks are in place, which disable the
    optimisation anyway), so we know we can skip forward until the check
    /fails/ and only then start looking for a real match. This is flip-flop

    Now consider the case with zero-width lookahead under /g: /(?=(\d+)X)/.
    In this case we have an additional failure mode, that is failure when
    we match a zero-width string twice at the same pos(). So now, the
    "flip-flop" logic breaks as it /is/ possible that we could match at
    "23456" when we couldn't match at "123456" because of the zero-length
    twice at the same pos() rule. For instance:

      print $1 for "123"=~/(?=(\d+))/g

    should first match "123". Since $& is zero length, pos() is not
    incremented. We then match again, successfully, except that the match
    is rejected despite technical-success because its $& is also zero
    length and pos() has not advanced. If the flip-flop mode is enabled
    we wont retry until we find a failing character first.

    The point here is that it makes perfect sense to disable the
    "flip-flop" mode optimisation when the start class is inside
    a lookahead as it really doesnt apply.

:100644 100644 74f1aa6bd3b6f8b351eca01babf67c0776cb8b23 52ba05203be5eb2e9588e34f9fbb0205cab33126 M      regcomp.c
:040000 040000 41a19ca407f0fb9a75a6baad0dfdfd73d1ff1464 71fffff03e858606e30b897ba9b8477e546bb0d0 M      t

Fixing this reported bug:

Nicholas Clark

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About