develooper Front page | perl.perl5.porters | Postings from February 2020

Re: Backporting ac3afc4b35 (regcomp.c: make \K+ and \K* illegal.)

Thread Previous | Thread Next
From:
hv
Date:
February 11, 2020 19:54
Subject:
Re: Backporting ac3afc4b35 (regcomp.c: make \K+ and \K* illegal.)
Message ID:
202002111936.01BJaYP15055@crypt.org
Karl Williamson <public@khwilliamson.com> wrote:
:On 2/11/20 10:12 AM, hv@crypt.org wrote:
:> I thought we were talking about /foo (?:\K+)/, not /(?:foo \K)+/?
:> 
:> I had assumed that \K only had to (effectively) set a pointer to what
:> we're now pretending the match point is, and the second iteration
:> of a \K{1,2} would just need to set the pointer to that same thing.
:> 
:> If that's not the case, it would probably help to have a concrete
:> example
:
:
:We are talking about
:
:/foo\K{m,n}bar/
:
:where 'foo' is arbitrary.  Already {0,0} is optimized to OPFAIL.
:
:The proposal would be to turn any other {0,n} into {0,1}, or more 
:concisely '?', and any other {m,n} into {1,1} or simply \K.  (For {1,1}+ 
:there would also have to be a non-backtracking marker added)
:
:What should happen if foo contains a code block?

I don't know why it should make any difference if foo contains a code
block, the quantifier applies only to the preceding atom, \K in this case.
So the question is how many times we perform KEEPS, and I would expect
it to be safe (but an effective noop) to perform it more than once, but
more optimal to perform it only once.

Again, I feel that I'm missing something.

For this case I would expect and do get "646":
  perl -wle '"abacade" =~ m{(a(??{"."})\K){1,5}(.)} && print @-'
I don't think we're talking about changing this case.

For this case I would expect and do get "202":
  perl -wle '"abacade" =~ m{(a(??{"."}))\K(.)} && print @-'
.. since the preamble is hit only once.

For this I would also expect "202":
  perl -wle '"abacade" =~ m{(a(??{"."})\K{1,1})(.)} && print @-'
Instead the match fails (along with a "Quantifier unexpected on zero-length
expression" warning). That feels like a bug to me.

For this I would also expect "202":
  perl -wle '"abacade" =~ m{(a(??{"."})\K{1,5})(.)} && print @-'
.. but given the {1,1} case fails, it's no surprise this fails too.

If it were doing the right thing (without the proposed optimization), we
would still only be executing the deferred eval once, it is just the \K
that would be getting executed more than once (which for the second and
subsequent times should mean nothing changes).

I think correctly implemented, \K{m,n} would be provably identical to \K
for all m>0, n>=m.

Hugo

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About