develooper Front page | perl.perl5.porters | Postings from September 2000

regex gripe... (backtracking woes)

Thread Next
From:
Jeff Pinyan
Date:
September 21, 2000 13:57
Subject:
regex gripe... (backtracking woes)
Message ID:
Pine.GSO.4.21.0009211656380.18425-100000@crusoe.crusoe.net
I found something out about Perl's regex engine that slightly upset
me.  Take this example:

  "A1234567890A" =~ /A(?:[^AB]*.B)*[^AB]*A/;

This is like reversing the /"[^"\\]*(?:\\.[^"\\]*)*"/ regex, except I
change " to A and \\ to B to reduce noise.

Now, I was HOPING Perl would do the following:

  <>  <A1234567890A>  =~ A
  <A>  <1234567890A>  =~ [^AB]*
  <A1234567890>  <A>  =~ .
  <A1234567890A>  <>  =~ B       FAILED
  <A123456789>  <0A>  =~ .
  <A1234567890>  <A>  =~ B       FAILED
  # at this point, it knows the part it matched as [^AB]*
  # can't POSSIBLY be an A or a B, so it doesn't keep whittling
  # down the 1234567890 section in search of /.B/
  <A1234567890>  <A>  =~ A
  <A1234567890A>  <>  DONE

However, this is sadly not the case.  Why isn't Perl's regex engine smart
enough to see that backtracking through /[^AB]*/ for /.B/ is a lost cause?

And I wish I could use (?>) here somewhere, but it would be too greedy,
and cause much angst.  If I did, the program would break on a string like
"A12345B6789A" which is perfectly valid.

I get around it by doing:

  /A(?:(?>[^AB]*)(?:(?<=.)|[AB])B)*[^AB]*A)/;

which is admittedly evil.  It seems a shame I have to go from

  /A[^AB]*(?:B.[^AB]*)*/

to its reverse of

  /A(?:(?>[^AB]*)(?:(?<=.)|[AB])B)*[^AB]*A)/

to keep it efficient.  So this brings up my previous question.

Why does the engine backtrack through things it should KNOW match /[^AB]/
for something that can match /.B/?

-- 
Jeff "japhy" Pinyan     japhy@pobox.com     http://www.pobox.com/~japhy/
PerlMonth - An Online Perl Magazine            http://www.perlmonth.com/
The Perl Archive - Articles, Forums, etc.    http://www.perlarchive.com/
CPAN - #1 Perl Resource  (my id:  PINYAN)        http://search.cpan.org/


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About