develooper Front page | perl.perl5.porters | Postings from February 2019

Re: RFC: Adding \p{foo=/re/}

Thread Previous | Thread Next
From:
demerphq
Date:
February 9, 2019 04:52
Subject:
Re: RFC: Adding \p{foo=/re/}
Message ID:
CANgJU+UyXnM7FFzCEGt-z0EKbGULjb5CGsThNgaU4pVa8Z3iEw@mail.gmail.com
On Thu, 7 Feb 2019, 03:46 Karl Williamson, <public@khwilliamson.com> wrote:

> On 2/5/19 11:27 PM, demerphq wrote:
> > Fwiw, I don't like it. What happens if the pattern includes capture
> > brackets, named recursion or eval ? This seems like a way to squeeze
> > named recursion concepts into the named property functionality without
> > thinking through the ramifications.
> >
> > Yves
>
> The way it's implemented is a separate regex is compiled and executed
> during the compilation of the outer one.  Maybe you know something about
> how that could fail, but it works in my limited testing, so I'm not sure
> you're stated concerns are valid.
>
> It calls subpattern_re = re_compile(pattern, 0);
> and then pregexec(subpattern_re, ...)
>

I hate to say it Karl, but this is what worries me.

This behavior seems like a poorly thought through attempt to do the same
thing as name recursion, and unless very carefully implemented will result
in terrible performance problems that we will have to sort out, and this
type of implementation will not make it easy.

Consider something like this:

/\p{foo=/whatever/}\p{foo}/

given your proposed implementation how will the optimiser know that this
pattern is equivalent to

/whateverwhatever/

This is what I mean by poorly thought out. How does this integrate with
other behavior, quantifiers and atomic patterns, backtracking and etc?

How about this:

"abbbabababababcab"=~/\p{foo=/[abc]+/}cab/

how will the optimizer backtrack this pattern?

What about

/\p{foo=/(??{rand}/}\p{foo}/

what will that do?

I think this proposal needs a LOT more thought and analysis before it goes
into Perl.

I understand the temptation of "hey, I can trivially bolt this new feature
into Perl", as I have myself been seduced by it in the past, but honestly,
it is a mistake to allow yourself to succumb. It is all too easy to add a
feature using a trivial implementation, but much much more difficult to
address the fallout later when people point out it isn't as efficient as it
should be, or doesn't interact sanely with other features in the regex
engine, or doesn't have a clear definition of the behavior.

I think before this gets added a lot more thought needs to be added, and it
probably cant be implemented as you said, or patterns using it will easily
become quadratic, and then lead to performance complaints.

Some of the questions I have are, how does it interact with capture
buffers, how does it interact with optimizations like the start-class
optimization, mandatory string detection, etc. How does it interact with
(??{...}) and (?{ ... }), how does it interact with the verbs? How does it
interact with $^R and $REGMATCH and $REGERROR? How does it interact with
named recursion? How do we avoid this form of expression becoming quadratic
or disabling optimisations?

For instance what happens here:

/\p{foo=/blah(?<name>...)/}(?&foo)/

Some of my experience is that it was easy to add named recursion to Perl,
but much much harder to optimize the result properly. I had to put
significantly more work into the optimization phase than the actual named
recursion implementation, which was pretty trivially added to the existing
EVAL framework.

I think you need to ask and answer a lot more questions than "is it
anchored" before this goes in.

I am *not* opposed to it going in, but these kind of questions need to be
answered first. So until a much more detailed summary of behavior is
provided I am against this.

To give you an example of my experience with these "neat features", I
implemented (?|...) and it took a few years before some of the flaws were
identified in it, and some of them are still yet to be resolved. I had a
similar experience with named recursion. Given that experience I am now in
the camp that nothing new like this should be added until all these
questions can be answered *first*, and the implementation needs to be smart
enough to resolve those questions in its first release.

So for instance, I could see /\p{foo=/.../}/ being implemented internally
as something like (?(DEFINE)(?<foo>...))\p{foo}/ except that named
recursion assumes that a named pattern is also a numbered capture buffer,
so something would have to be done to address that. Maybe a form of a
recursive subpattern that doesn't capture explicitly,  but then I would
expect it to have an equivalent non \p{...} form, and I wonder how that
would look? Maybe (?<<foo>>...)? Then the implementation would share the
optimisation logic used by the named recursion logic, and we wouldn't have
two totally separate implementations to optimize.

But as is, I think this feature exposes a LOT of questions that need to be
answered before you move forward, and I am VERY doubtful that the naive
implementation you suggest is the right way to do things.

Sorry to be the bearer of bad tidings on this, but once stung, twice shy
and all of that.

Yves

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About