develooper Front page | perl.perl5.porters | Postings from July 2018

Re: [perl #133352] Ancient Regex Regression

Thread Previous | Thread Next
From:
Dave Mitchell
Date:
July 17, 2018 15:02
Subject:
Re: [perl #133352] Ancient Regex Regression
Message ID:
4786_1531839745_5B4E04FD_4786_5_1_20180717150206.GF2753@iabyn.com
On Tue, Jul 10, 2018 at 02:14:03PM -0700, Deven T. Corzine via RT wrote:
> Yeah, that's why I said the correct "seems" to be "a".  There's a decent
> argument for returning undef

My take on it is based on two (possibly conflicting) ideals.

The first is that when starting the (N+1)th iteration of a '*' expression
or similar, the captures from the Nth iteration are still valid, until
over-written by the (N+1)th iteration. This allows patterns of this form
to work:

    print "1=[$1]\n" if "AA-ABB-" =~ /^ (?: \1? (\w) \1 - )* $/x

which outputs 'B'.

On the first iteration:
    the first  \1 fails to match anything, and is skipped via the '?';
    the second \1 matches 'A'

On the second iteration:
    the first  \1 matches 'A' - the value from the last iteration
    the second \1 matches 'B' - the value from the current iteration

The second principle is that following an alternation, the captures
from branches that failed or weren't tried, are invalid.

Neither of these match:

    print "matched\n" if "B" =~ /^ (?: A(.) | B ) \1 $/x;
    print "matched\n" if "B" =~ /^ (?: A | B | (.) ) \1 $/x;

So the real question is, at the end of an alternation, should any
'unused' captures within the alternation be flagged as invalid,
or should they preserved -  retaining the values they had at the start of
the alternation (which may be real values if this is a second+ iteration
of an enclosing '*' etc). 

I think you could argue it either way. However, since this bug has been
around since forever, with no-one apparently noticing it before, I think
can fix it how we like - so we should pick whichever is easiest to
implement.

Invalidating captures set by a failing branch involves just knowing the
max index at the start and end of the branch execution, and invalidating
everything in between; restoring previous values involves saving a whole
set of capture indices and restoring them on failure (which is what I
think your patch does). The latter sounds a whole lot more expensive, and
would potentially slow down all alterations.

-- 
The Enterprise is involved in a bizarre time-warp experience which is in
some way unconnected with the Late 20th Century.
    -- Things That Never Happen in "Star Trek" #14

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About