develooper Front page | perl.perl5.porters | Postings from March 2016

Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats allavailable RAM

Thread Previous | Thread Next
From:
Zefram
Date:
March 8, 2016 17:21
Subject:
Re: [perl #126182] /(.(?2))((?<=(?=(?1)).))/ hangs and eats allavailable RAM
Message ID:
20160308172058.GI7946@fysh.org
demerphq wrote:
>Anyway, much of the dialog that you posted here IMO is unrelated to
>regex infinite recursion, it relates to how some parts of the regex
>handle matching the empty string.

It's almost precisely equivalent.  The problems you're addressing in (?R)
type recursion are the same ones that occur with * iteration, because the
iteration really operates in a recursive fashion.  In general, /(x*)/
is equivalent to /((?:x(?1))?)/, and for the purposes of which ways it
can match those are equivalent to /((?:(?1)x)?)/.  The (documented)
behaviour that captures by the recursively-called subpattern are not
available to the calling pattern breaks some of my constructions, but
generally the same problems apply.

>Can you show a pattern where regex recursion does not do the right
>thing?

Just using left recursion, not emulating a zero-length iteration, consider

	"abbc" =~ /a((?:(?1)b)?)c/

Mathematically this does match, in exactly one way.  But the naive
execution of the pattern would infinitely recurse, attempting to try out
infinitely many paths that would not match before it'll consider the
way that actually does.  So acceptable results are to yield a match,
hang, exhaust memory, or signal an exception to forestall hanging.
5.22 signals "Infinite recursion in regex", which is fine.  blead says
it does not match, which is the wrong answer.

The left recursion can be rescued by changing the ? to ??, so preferring
less recursion.  Of course this changes the equivalent iterative construct
from * to *?.  With that change, blead correctly shows a match, but 5.22
still signals "Infinite recursion".

Looking at some zero-length-iteration cases, I found some curious
behaviour on

	"abc" =~ /a((?1)??)c/

5.22 signals "Infinite recursion" for this, not surprisingly, but
blead hangs, slowly eating memory.  Hanging is acceptable here, as I've
said, but this seems to be a regression from the point of view of your
objective of avoiding hanging.  I'd expect your fail-on-recursion logic
to (correctly) say that this does not match, in finite time.  There are
closely related cases, such as the same thing with ? in place of ??,
for which blead produces the right answer and 5.22 signals "Infinite
recursion".

>The only problem are things like /(?<foo>(?&foo))/.

With unconditional recursion, that has no way to match using a finite
amount of structure, and mathematically it's tautological, so I'm fine
about it being shortcutted.  I'm concerned about things that in principle
can actually match.

-zefram

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About