develooper Front page | perl.perl5.porters | Postings from September 2000

Re: regex gripe... (backtracking woes)

Thread Previous | Thread Next
From:
Hugo
Date:
September 21, 2000 16:59
Subject:
Re: regex gripe... (backtracking woes)
Message ID:
200009220000.BAA18291@crypt.compulink.co.uk
In <Pine.GSO.4.21.0009211656380.18425-100000@crusoe.crusoe.net>,
Jeff Pinyan writes:
:I found something out about Perl's regex engine that slightly upset
:me.  Take this example:
:
:  "A1234567890A" =~ /A(?:[^AB]*.B)*[^AB]*A/;

I think this is the same as /A(?:[^AB]|.B)*A/, which should be more
efficient.

[...]
:Why does the engine backtrack through things it should KNOW match /[^AB]/
:for something that can match /.B/?

Presumably because noone has yet wanted it badly enough to write a
patch for it. :) I suspect the effort required to do the subexpression
analysis would actually slow things down a lot in the average case.

However I do think we should be looking at the possibility of
recognising inefficient expressions that we can rewrite to more
efficient equivalents. Trouble is that we'd more often come across
(the moral equivalent of) /A([^AB]*.B)*[^AB]*A/, which I think is
much more difficult to improve on unless we invent something like
cross-nested parens:

  / A (+open 1) (?: [^AB] | .B (+close 1) (+open 1) )* A /x

Hugo

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About