develooper Front page | perl.perl6.language.regex | Postings from December 2000

Re: Perl 5's "non-greedy" matching can be TOO greedy!

From:
Tom Christiansen
Date:
December 15, 2000 13:37
Subject:
Re: Perl 5's "non-greedy" matching can be TOO greedy!
Message ID:
27229.976916261@chthon
>At worst, this should take no more than double the amount of time that the
>single pass did, probably less.  Hardly a cause to concern ourselves with
>the heat death of the universe.

Oh really?  We have shown that for the kind of global overall
analysis that you are asking for, that in the general case, all
possible paths much be taken.  You cannot short-circuit, because
you must first consider all possibilities and then weigh each valid
result against each other valid result.

Consider something like /.*/ or /.*?/.  For a string a length N,
there are

    (N+1) (N+2)
    -----------
	 2

substrings that that matches.  That means that an 80-byte string
has some 3321 possible substrings, all of which must be considered.

In the short-circuiting version, the Engine need consider but one
single solitary case for each of those.  3321 is not the double of 1.

Consider now something like /(.*)(.*)/ or /(.*?)(.*?)/ or /(.*)(.*?)/
or /(.*)(.*?)/.  You now have


		   2
    ( (N+1) (N+2) )
    ---------------
	   4

cases to consider, or, in the case of an 80-byte string, some
11,029,041 possible choices.  

And with the current, normal, standard, short-circuiting system, 
the Engine has to consider, hm... could it be just one possibility?
And that's just with two wildcards.  People are often writing more
than that.

Can you now see why this would be a problem?  And how even in the
cases where it didn't actually break old programs (many of which it
would!) that it would cause many many them to apparently hang, 
racing for electron death?

--tom



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About