develooper Front page | perl.perl5.porters | Postings from September 2000

Re: [ID 20000919.002] Infinite Loop In Regexp Machinery

Thread Previous | Thread Next
From:
Ronald J Kimball
Date:
September 19, 2000 19:27
Subject:
Re: [ID 20000919.002] Infinite Loop In Regexp Machinery
Message ID:
20000919222623.B352374@linguist.dartmouth.edu
On Tue, Sep 19, 2000 at 04:33:51PM -0700, Karl Dahlke wrote:
> The following perl program hangs when it evaluates the regexp.
> 
> #  Sample html text from www.spacescience.com.
> #  Note that a quote is missing, which really throws things off.
> #  Still, it shouldn't cause the regexp machinery to hang.
> $buffer = '
> <IMG  SRC="headlines/images/harvestmoon/frontpage.gif" ALIGN="RIGHT"
>     hspace="4 width=" 120" HEIGHT="99" BORDER="2" WIDTH="120" NATURALSIZEFLAG="1"></A>';
> 
> #  Break the text up into html tags.
> print "Pre regexp\n";
> $buffer =~ s/<  # start the tag
> (\/?)  # leading slash turns off the tag
> ([a-zA-Z]+)  # name of the tag
> (([^>"']+  # unquoted stuff inside the tag
> |"[^"]*"  # stuff in double quotes
> |'[^']*'  # stuff in single quotes
> )*)>  # as many of these chunks as you need
> /tag/xsg;
> 
> print "Post regexp\n";  # never gets here

You just didn't wait long enough.  :)

Because the target string has an odd number of double-quotes, the regex is
not going to match the IMG tag.  However, because part of the regex is
equivalent to ([^>"']+)* , the regex engine takes exponential time, based
on the length of the target string, to fail to find a match.

You might be interested in the book _Mastering Regular Expressions_, by
Jeffrey Friedl, which includes a section on regexes with this problem.

Ronald

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About