Front page | perl.perl5.porters |
Postings from February 2020
Re: Backporting ac3afc4b35 (regcomp.c: make \K+ and \K* illegal.)
Thread Previous
|
Thread Next
From:
Karl Williamson
Date:
February 11, 2020 16:41
Subject:
Re: Backporting ac3afc4b35 (regcomp.c: make \K+ and \K* illegal.)
Message ID:
0d2c07f5-1e32-6aa4-1083-264498283802@khwilliamson.com
I have further thoughts on this.
On 2/11/20 5:01 AM, hv@crypt.org wrote:
> Dan Book <grinnz@gmail.com> wrote:
> :On Mon, Feb 10, 2020 at 6:43 PM <hv@crypt.org> wrote:
> :> Karl Williamson <public@khwilliamson.com> wrote:
> :> :I can't think of any legitimate purposes for quantifying any zero-length
> :> :assertion.
> :> :
> :> :Using the general form of expressing them, I don't see a difference
> :> :between any of {0,1} {0,2} {0,3} ... {0, infinity}. They are all
> :> :no-ops, except possibly for side effects in code blocks, as Grinnz
> :> :pointed out on IRC.
> :> :
> :> :Nor do I see any difference between {1,1} {1,2} ... {2,2} {2,3} ...
> :> :in programs that aren't in error.
> :> :
> :> :So why can't we optimize all such constructs to their minimal forms.
> :> :That would make \K+ optimize to effectively \K{1,1}, and the infinite
> :> :loop would go away, and non-broken programs wouldn't be affected. This
> :> :could be done for 5.30.2, I think
> :> :
> :> :We could make it fatal in blead.
> :> :
> :> :And optimizing all quantifiers of zero width assertions away or to {0,1}
> :> :in blead would prevent things like this from happening anywhere.
> :>
> :> If we can optimize them to a non-looping form for 5.30.2, that seems
> :> like a fine solution - there sees no reason thereafter to make if fatal
> :> in blead.
> :>
> :
> :The reason to make them fatal in blead is because they are nonsensical, and
> :so this user error should be pointed out loudly.
>
> For the general case - "quantifying any zero-length assertion" - I don't
> agreee that they are nonsensical. These are combinations of two legitimate
> language constructs with a clearly defined meaning both alone and in
> combination, certainly for the finite cases and arguably for the infinite
> ones.
I agree.
The fact that in some cases something with identical effect can be
> expressed more succinctly might merit a warning, but if we can simply
> match what is asked it seems rude to choose to die on them instead.
I agree. Contrary to what some people would argue, backwards
compatibility is important for a mature code base like Perl. If there
are really stupid things that would have been fatal if their possibility
had occurred to us when we first wrote the code, that's what 'use re
"strict"' is for.
In the case of quantifying \K, it's perhaps so stupid that no one (this
was brought to our attention by a fuzzer) had thought to do it in all
the years it's been out there. If so, making it fatal, might not hurt
anyone. But I don't think we can backport that change.
I do think that Perl, either by design or carelessness tried to DWIM
without notice and in too many cases, got it wrong, by reading the
programmer's mind wrongly and giving a nonsensical result, and not what
the programmer meant at all. An example is a{,5} where someone likely
would think that meant a{0,5}, as it does in some other languages. But
it didn't. It silently became the literal string "a{,5}". One little
typo in the syntax of a quantifier, and you got something completely
different. Nowadays it is fatal, and its been fatal long enough that I
plan to make it mean the quantifier {0,5} in 5.34. But it took nearly a
decade to get to this because of avoiding breakage of existing code. I
have become chastened about adding even warnings for stupid things.
>
> On the other hand, I think the proposed change would break eg:
> qr{ (?= foo (??{ bar() }) baz )+ }x
> .. which I think is an argument against rushing anything out for the more
> general case - IIRC we already have bugs backtracking over lookaheads
> with backreferences, probably due to similar thinking.
You are right. I neglected to mention that the change would not affect
assertions that have things like (??...) in them. In regcomp.c, one of
the flags that does seem to work is POSTPONED, which indicates that one
of these has been seen within the current construct. That flag signals
that these optimizations should not be applied.
But I forgot to consider what happens when a \K has a POSTPONED
construct in it. So your point brought that to my attention.
>
> For the specific case of \K however I see no obvious barrier to optimizing
> /\K{m, n}/ to one of noop, \K? or \K.
What about POSTPONED constructs? I'm unclear about this.
>
> Hugo
>
Thread Previous
|
Thread Next