develooper Front page | perl.perl5.porters | Postings from February 2019

Re: RFC: Adding \p{foo=/re/}

Thread Previous | Thread Next
From:
demerphq
Date:
February 9, 2019 04:54
Subject:
Re: RFC: Adding \p{foo=/re/}
Message ID:
CANgJU+X-o0k4ZKMdTB6qFwzuFhJ+rgKbA1EwviZPQbs2Axbddg@mail.gmail.com
On Sat, 9 Feb 2019 at 05:52, demerphq <demerphq@gmail.com> wrote:
>
>
>
> On Thu, 7 Feb 2019, 03:46 Karl Williamson, <public@khwilliamson.com> wrote:
>>
>> On 2/5/19 11:27 PM, demerphq wrote:
>> > Fwiw, I don't like it. What happens if the pattern includes capture
>> > brackets, named recursion or eval ? This seems like a way to squeeze
>> > named recursion concepts into the named property functionality without
>> > thinking through the ramifications.
>> >
>> > Yves
>>
>> The way it's implemented is a separate regex is compiled and executed
>> during the compilation of the outer one.  Maybe you know something about
>> how that could fail, but it works in my limited testing, so I'm not sure
>> you're stated concerns are valid.
>>
>> It calls subpattern_re = re_compile(pattern, 0);
>> and then pregexec(subpattern_re, ...)
>
>
> I hate to say it Karl, but this is what worries me.
>
> This behavior seems like a poorly thought through attempt to do the same thing as name recursion, and unless very carefully implemented will result in terrible performance problems that we will have to sort out, and this type of implementation will not make it easy.
>
> Consider something like this:
>
> /\p{foo=/whatever/}\p{foo}/
>
> given your proposed implementation how will the optimiser know that this pattern is equivalent to
>
> /whateverwhatever/
>
> This is what I mean by poorly thought out. How does this integrate with other behavior, quantifiers and atomic patterns, backtracking and etc?
>
> How about this:
>
> "abbbabababababcab"=~/\p{foo=/[abc]+/}cab/
>
> how will the optimizer backtrack this pattern?
>
> What about
>
> /\p{foo=/(??{rand}/}\p{foo}/
>
> what will that do?
>
> I think this proposal needs a LOT more thought and analysis before it goes into Perl.
>
> I understand the temptation of "hey, I can trivially bolt this new feature into Perl", as I have myself been seduced by it in the past, but honestly, it is a mistake to allow yourself to succumb. It is all too easy to add a feature using a trivial implementation, but much much more difficult to address the fallout later when people point out it isn't as efficient as it should be, or doesn't interact sanely with other features in the regex engine, or doesn't have a clear definition of the behavior.
>
> I think before this gets added a lot more thought needs to be added, and it probably cant be implemented as you said, or patterns using it will easily become quadratic, and then lead to performance complaints.
>
> Some of the questions I have are, how does it interact with capture buffers, how does it interact with optimizations like the start-class optimization, mandatory string detection, etc. How does it interact with (??{...}) and (?{ ... }), how does it interact with the verbs? How does it interact with $^R and $REGMATCH and $REGERROR? How does it interact with named recursion? How do we avoid this form of expression becoming quadratic or disabling optimisations?

Another question is : Does PCRE or any other regex engine support this
already? What semantics do they expose?

Yves

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About