develooper Front page | perl.perl5.porters | Postings from March 2006

[perl #22051] segfault (deep recursion?) in regex match

From:
Steve Peters via RT
Date:
March 30, 2006 19:26
Subject:
[perl #22051] segfault (deep recursion?) in regex match
Message ID:
rt-3.0.11-22051-131755.3.99702424868032@perl.org
> [pcg@goof.com - Sun Apr 27 17:02:30 2003]:
> 
> 
> This is a bug report for perl from root@cerebro.laendle,
> generated with the help of perlbug 1.34 running under perl v5.8.1.
> 
> 
> -----------------------------------------------------------------
> [Please enter your report here]
> 
> Due to a bug I once fed the wrong text into some regex and earned....
> a
> segfault in a function that seemd to segfault occasionally before but
> I
> never found a good testcase.
> 
> This testcase isn't very good, either, because it seems to require a
> big
> document that I put on my webserver so I didn't need to atatch it.
> 
> Here is the program that segfaults with both perl-5.8.0 from debian as
> well as with my own perl-5.8.1 MAINT19040:
> 
>    # just get the test data into $data
>    use LWP::Simple;
>    $data = get "http://data.plan9.de/macbeth.xml";
> 
>    # the segfault occurs on the second round (i think) in the first
> regex.
>    for(;;) {
>       $data =~ /\G([:?])>((?:[^<]+|<[^:?])*)/xgcs or last;
>       $data =~ /\G<([:?])((?:[^:?]+|[:?][^>])*)/gcs or last;
>    }
> 
> when I run this program I get a segfault because of a very deep
> recursion:
> 
>    #0  S_regmatch (prog=0x81252a8) at regexec.c:2237
>    #1  0x080e605e in S_regmatch (prog=0x8125318) at regexec.c:3244
>    #2  0x080e5068 in S_regmatch (prog=0x81252a8) at regexec.c:3789
>    #3  0x080e605e in S_regmatch (prog=0x81252a8) at regexec.c:3244
>    #4  0x080e605e in S_regmatch (prog=0x8125318) at regexec.c:3244
>    #5  0x080e5068 in S_regmatch (prog=0x81252a8) at regexec.c:3789
>    #6  0x080e605e in S_regmatch (prog=0x81252a8) at regexec.c:3244
>    ...
>    #18941 0x080e5068 in S_regmatch (prog=0x81252a8) at regexec.c:3789
>    #18942 0x080e605e in S_regmatch (prog=0x8125318) at regexec.c:3244
>    #18943 0x080e5cef in S_regmatch (prog=0x8125250) at regexec.c:3079
>    #18944 0x080e2c4f in S_regtry (prog=0x812520c, startpos=0x8431819
> "?>\n<!DOCTYPE PLAY SYSTEM \"play.dtd\">\n\n<PLAY>\n<TITLE>The Tragedy
> of Macbeth</TITLE>\n\n<FM>\n<P>Text placed in th
> 
> I don't know if this is a bug or the document is simply too long to be
> matched by regex (which might be suboptimal, although I tried to make
> it
> perform ok. In case you wonde,r it is used to match "<: code :>"
> sections
> inside some other text (optionally "<? code ?>" as well). The first
> regex
> matches ":>literal..." and the second regex matches "<:code...". The
> test
> document contains a single ":>" at the beginning and then a long XML
> text.
> 
> Even if this problem is caused by a bad regex, I don't think it should
> end
> up in such a big recursion. In addition, I would expect the regex:
> 
>    \G ( [:?]) > ( (?: [^<]+ | < [^:?]) * )
> 
> to be quite linear without much recursion (many regexes optimized for
> speed have this form, I think), which gave me enough strength to file
> this
> as a bug report ;)
> 
> 

This problem has been resolved with change #27598.

steve@kirk:~/smoke/perl-current$ perl rt_22051.pl 
Segmentation fault
steve@kirk:~/smoke/perl-current$ ./perl rt_22051.pl 
steve@kirk:~/smoke/perl-current$ cat rt_22051.pl
#!perl -w

$/="";
my $data = my $text = do { local( $/ ) ; <DATA> } ;;
for(;;) {
      $data =~ /\G([:?])>((?:[^<]+|<[^:?])*)/xgcs or last;
      $data =~ /\G<([:?])((?:[^:?]+|[:?][^>])*)/gcs or last;
   }

__DATA__
:><?xml version="1.0"?>
<!DOCTYPE PLAY SYSTEM "play.dtd">

<PLAY>
<TITLE>The Tragedy of Macbeth</TITLE>
...



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About