develooper Front page | perl.perl5.porters | Postings from February 2020

Re: Backporting ac3afc4b35 (regcomp.c: make \K+ and \K* illegal.)

Thread Previous | Thread Next
From:
hv
Date:
February 11, 2020 12:19
Subject:
Re: Backporting ac3afc4b35 (regcomp.c: make \K+ and \K* illegal.)
Message ID:
202002111201.01BC1jP13006@crypt.org
Dan Book <grinnz@gmail.com> wrote:
:On Mon, Feb 10, 2020 at 6:43 PM <hv@crypt.org> wrote:
:> Karl Williamson <public@khwilliamson.com> wrote:
:> :I can't think of any legitimate purposes for quantifying any zero-length
:> :assertion.
:> :
:> :Using the general form of expressing them, I don't see a difference
:> :between any of {0,1} {0,2} {0,3} ... {0, infinity}.  They are all
:> :no-ops, except possibly for side effects in code blocks, as Grinnz
:> :pointed out on IRC.
:> :
:> :Nor do I see any difference between {1,1} {1,2} ... {2,2} {2,3} ...
:> :in programs that aren't in error.
:> :
:> :So why can't we optimize all such constructs to their minimal forms.
:> :That would make \K+ optimize to effectively \K{1,1}, and the infinite
:> :loop would go away, and non-broken programs wouldn't be affected.  This
:> :could be done for 5.30.2, I think
:> :
:> :We could make it fatal in blead.
:> :
:> :And optimizing all quantifiers of zero width assertions away or to {0,1}
:> :in blead would prevent things like this from happening anywhere.
:>
:> If we can optimize them to a non-looping form for 5.30.2, that seems
:> like a fine solution - there sees no reason thereafter to make if fatal
:> in blead.
:>
:
:The reason to make them fatal in blead is because they are nonsensical, and
:so this user error should be pointed out loudly.

For the general case - "quantifying any zero-length assertion" - I don't
agreee that they are nonsensical. These are combinations of two legitimate
language constructs with a clearly defined meaning both alone and in
combination, certainly for the finite cases and arguably for the infinite
ones. The fact that in some cases something with identical effect can be
expressed more succinctly might merit a warning, but if we can simply
match what is asked it seems rude to choose to die on them instead.

On the other hand, I think the proposed change would break eg:
  qr{ (?= foo (??{ bar() }) baz )+ }x
.. which I think is an argument against rushing anything out for the more
general case - IIRC we already have bugs backtracking over lookaheads
with backreferences, probably due to similar thinking.

For the specific case of \K however I see no obvious barrier to optimizing
/\K{m, n}/ to one of noop, \K? or \K.

Hugo

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About