develooper Front page | perl.perl5.porters | Postings from February 2020

Re: Backporting ac3afc4b35 (regcomp.c: make \K+ and \K* illegal.)

Thread Previous | Thread Next
Karl Williamson
February 11, 2020 16:41
Re: Backporting ac3afc4b35 (regcomp.c: make \K+ and \K* illegal.)
Message ID:
I have further thoughts on this.

On 2/11/20 5:01 AM, wrote:
> Dan Book <> wrote:
> :On Mon, Feb 10, 2020 at 6:43 PM <> wrote:
> :> Karl Williamson <> wrote:
> :> :I can't think of any legitimate purposes for quantifying any zero-length
> :> :assertion.
> :> :
> :> :Using the general form of expressing them, I don't see a difference
> :> :between any of {0,1} {0,2} {0,3} ... {0, infinity}.  They are all
> :> :no-ops, except possibly for side effects in code blocks, as Grinnz
> :> :pointed out on IRC.
> :> :
> :> :Nor do I see any difference between {1,1} {1,2} ... {2,2} {2,3} ...
> :> :in programs that aren't in error.
> :> :
> :> :So why can't we optimize all such constructs to their minimal forms.
> :> :That would make \K+ optimize to effectively \K{1,1}, and the infinite
> :> :loop would go away, and non-broken programs wouldn't be affected.  This
> :> :could be done for 5.30.2, I think
> :> :
> :> :We could make it fatal in blead.
> :> :
> :> :And optimizing all quantifiers of zero width assertions away or to {0,1}
> :> :in blead would prevent things like this from happening anywhere.
> :>
> :> If we can optimize them to a non-looping form for 5.30.2, that seems
> :> like a fine solution - there sees no reason thereafter to make if fatal
> :> in blead.
> :>
> :
> :The reason to make them fatal in blead is because they are nonsensical, and
> :so this user error should be pointed out loudly.
> For the general case - "quantifying any zero-length assertion" - I don't
> agreee that they are nonsensical. These are combinations of two legitimate
> language constructs with a clearly defined meaning both alone and in
> combination, certainly for the finite cases and arguably for the infinite
> ones. 

I agree.

The fact that in some cases something with identical effect can be
> expressed more succinctly might merit a warning, but if we can simply
> match what is asked it seems rude to choose to die on them instead.

I agree.  Contrary to what some people would argue, backwards 
compatibility is important for a mature code base like Perl.  If there 
are really stupid things that would have been fatal if their possibility 
had occurred to us when we first wrote the code, that's what 'use re 
"strict"' is for.

In the case of quantifying \K, it's perhaps so stupid that no one (this 
was brought to our attention by  a fuzzer) had thought to do it in all 
the years it's been out there.  If so, making it fatal, might not hurt 
anyone.  But I don't think we can backport that change.

I do think that Perl, either by design or carelessness tried to DWIM 
without notice and in too many cases, got it wrong, by reading the 
programmer's mind wrongly and giving a nonsensical result, and not what 
the programmer meant at all.  An example is a{,5} where someone likely 
would think that meant a{0,5}, as it does in some other languages.  But 
it didn't.  It silently became the literal string "a{,5}".  One little 
typo in the syntax of a quantifier, and you got something completely 
different.  Nowadays it is fatal, and its been fatal long enough that I 
plan to make it mean the quantifier {0,5} in 5.34.  But it took nearly a 
decade to get to this because of avoiding breakage of existing code.  I 
have become chastened about adding even warnings for stupid things.
> On the other hand, I think the proposed change would break eg:
>    qr{ (?= foo (??{ bar() }) baz )+ }x
> .. which I think is an argument against rushing anything out for the more
> general case - IIRC we already have bugs backtracking over lookaheads
> with backreferences, probably due to similar thinking.

You are right.  I neglected to mention that the change would not affect 
assertions that have things like (??...) in them.  In regcomp.c, one of 
the flags that does seem to work is POSTPONED, which indicates that one 
of these has been seen within the current construct.  That flag signals 
that these optimizations should not be applied.

But I forgot to consider what happens when a \K has a POSTPONED 
construct in it.  So your point brought that to my attention.
> For the specific case of \K however I see no obvious barrier to optimizing
> /\K{m, n}/ to one of noop, \K? or \K.

What about POSTPONED constructs?  I'm unclear about this.
> Hugo

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About