develooper Front page | perl.perl5.porters | Postings from July 2018

Re: [perl #133352] Ancient Regex Regression

Thread Previous | Thread Next
From:
Dave Mitchell
Date:
July 18, 2018 08:23
Subject:
Re: [perl #133352] Ancient Regex Regression
Message ID:
32328_1531902211_5B4EF902_32328_30_1_20180718082318.GG2753@iabyn.com
On Tue, Jul 17, 2018 at 11:42:11AM -0400, Deven T. Corzine wrote:
> On Tue, Jul 17, 2018 at 11:02 AM, Dave Mitchell <davem@iabyn.com> wrote:
> > So the real question is, at the end of an alternation, should any
> > 'unused' captures within the alternation be flagged as invalid,
> > or should they preserved -  retaining the values they had at the start of
> > the alternation (which may be real values if this is a second+ iteration
> > of an enclosing '*' etc).
> 
> Indeed, and there is certainly room for debate here, as both arguments
> are reasonable.
> 
> Personally, invalidating a successful capture that was part of a
> successful match feels wrong to me.  Consider this example:
> 
>      "foobar" =~ /^ (?: (foo) | (bar) )* $/x;
> 
> Both with and without my patch, this leaves $1="foo" and $2="bar".  If
> the "unused" capture were to be invalidated, this would leave $1
> undefined instead.  Would this be desirable?

It would match the (admittedly ambiguous) bit in perlre that David Nicol
quoted:

    If a group did not match, the associated backreference won't match
    either. (This can happen if the group is optional, or in a different
    branch of an alternation.)

> Yes, that's what my current patch does, and there IS a performance
> issue here, since I'm currently saving and restoring ALL captures,
> whether or not they're in the alternation.  This is unnecessary, but I
> haven't figured out yet how to set the paren floor correctly to only
> save the necessary ones.  I think that would mitigate most of the
> performance impact, though not all of it.

There is considerable performance overhead in saving even just one set of
capture indices - the marginal cost of saving more than one is less. So
saving fewer is good, saving none is a *lot* better.

The only ones needing saving or invalidating are the ones with indices
lying between lastopen+1 .. maxopenparen (I think, based on a quick look).

It might be worth writing an alternative patch which does just the
invalidation, rather than saving/restoring, and see what, if any, tests fail.
Those failures may give more insight into original intent, and whether
saving is worth it.

I suspect that writing the invalidation code might be quite tricky (in the
sense that the eventual code will be simple, but working out what that
code should be exactly may be hard).

> Would it be better to remember the captures some other way?  Perl 6
> returns ALL the captures; should Perl 5 have that capability too?

What do you mean exactly?

-- 
Little fly, thy summer's play my thoughtless hand
has terminated with extreme prejudice.
        (with apologies to William Blake)

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About