Front page | perl.perl5.porters | Postings from July 2018

## Re: [perl #133352] Ancient Regex Regression

From:
Dave Mitchell
Date:
July 17, 2018 15:02
Subject:
Re: [perl #133352] Ancient Regex Regression
Message ID:
4786_1531839745_5B4E04FD_4786_5_1_20180717150206.GF2753@iabyn.com
```On Tue, Jul 10, 2018 at 02:14:03PM -0700, Deven T. Corzine via RT wrote:
> Yeah, that's why I said the correct "seems" to be "a".  There's a decent
> argument for returning undef

My take on it is based on two (possibly conflicting) ideals.

The first is that when starting the (N+1)th iteration of a '*' expression
or similar, the captures from the Nth iteration are still valid, until
over-written by the (N+1)th iteration. This allows patterns of this form
to work:

print "1=[\$1]\n" if "AA-ABB-" =~ /^ (?: \1? (\w) \1 - )* \$/x

which outputs 'B'.

On the first iteration:
the first  \1 fails to match anything, and is skipped via the '?';
the second \1 matches 'A'

On the second iteration:
the first  \1 matches 'A' - the value from the last iteration
the second \1 matches 'B' - the value from the current iteration

The second principle is that following an alternation, the captures
from branches that failed or weren't tried, are invalid.

Neither of these match:

print "matched\n" if "B" =~ /^ (?: A(.) | B ) \1 \$/x;
print "matched\n" if "B" =~ /^ (?: A | B | (.) ) \1 \$/x;

So the real question is, at the end of an alternation, should any
'unused' captures within the alternation be flagged as invalid,
or should they preserved -  retaining the values they had at the start of
the alternation (which may be real values if this is a second+ iteration
of an enclosing '*' etc).

I think you could argue it either way. However, since this bug has been
around since forever, with no-one apparently noticing it before, I think
can fix it how we like - so we should pick whichever is easiest to
implement.

Invalidating captures set by a failing branch involves just knowing the
max index at the start and end of the branch execution, and invalidating
everything in between; restoring previous values involves saving a whole
set of capture indices and restoring them on failure (which is what I
think your patch does). The latter sounds a whole lot more expensive, and
would potentially slow down all alterations.

--
The Enterprise is involved in a bizarre time-warp experience which is in
some way unconnected with the Late 20th Century.
-- Things That Never Happen in "Star Trek" #14

```