develooper Front page | perl.perl5.porters | Postings from September 2000

regex gripe... (backtracking woes)

Thread Next
Jeff Pinyan
September 21, 2000 13:57
regex gripe... (backtracking woes)
Message ID:
I found something out about Perl's regex engine that slightly upset
me.  Take this example:

  "A1234567890A" =~ /A(?:[^AB]*.B)*[^AB]*A/;

This is like reversing the /"[^"\\]*(?:\\.[^"\\]*)*"/ regex, except I
change " to A and \\ to B to reduce noise.

Now, I was HOPING Perl would do the following:

  <>  <A1234567890A>  =~ A
  <A>  <1234567890A>  =~ [^AB]*
  <A1234567890>  <A>  =~ .
  <A1234567890A>  <>  =~ B       FAILED
  <A123456789>  <0A>  =~ .
  <A1234567890>  <A>  =~ B       FAILED
  # at this point, it knows the part it matched as [^AB]*
  # can't POSSIBLY be an A or a B, so it doesn't keep whittling
  # down the 1234567890 section in search of /.B/
  <A1234567890>  <A>  =~ A
  <A1234567890A>  <>  DONE

However, this is sadly not the case.  Why isn't Perl's regex engine smart
enough to see that backtracking through /[^AB]*/ for /.B/ is a lost cause?

And I wish I could use (?>) here somewhere, but it would be too greedy,
and cause much angst.  If I did, the program would break on a string like
"A12345B6789A" which is perfectly valid.

I get around it by doing:


which is admittedly evil.  It seems a shame I have to go from


to its reverse of


to keep it efficient.  So this brings up my previous question.

Why does the engine backtrack through things it should KNOW match /[^AB]/
for something that can match /.B/?

Jeff "japhy" Pinyan
PerlMonth - An Online Perl Magazine  
The Perl Archive - Articles, Forums, etc.
CPAN - #1 Perl Resource  (my id:  PINYAN)

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About