develooper Front page | perl.perl5.porters | Postings from April 2003

[perl #22051] segfault (deep recursion?) in regex match

From:
Marc Lehmann
Date:
April 28, 2003 07:43
Subject:
[perl #22051] segfault (deep recursion?) in regex match
Message ID:
rt-22051-56081.3.1067383287423@bugs6.perl.org
# New Ticket Created by  Marc Lehmann 
# Please include the string:  [perl #22051]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt2/Ticket/Display.html?id=22051 >



This is a bug report for perl from root@cerebro.laendle,
generated with the help of perlbug 1.34 running under perl v5.8.1.


-----------------------------------------------------------------
[Please enter your report here]

Due to a bug I once fed the wrong text into some regex and earned.... a
segfault in a function that seemd to segfault occasionally before but I
never found a good testcase.

This testcase isn't very good, either, because it seems to require a big
document that I put on my webserver so I didn't need to atatch it.

Here is the program that segfaults with both perl-5.8.0 from debian as
well as with my own perl-5.8.1 MAINT19040:

   # just get the test data into $data
   use LWP::Simple;
   $data = get "http://data.plan9.de/macbeth.xml";

   # the segfault occurs on the second round (i think) in the first regex.
   for(;;) {
      $data =~ /\G([:?])>((?:[^<]+|<[^:?])*)/xgcs or last;
      $data =~ /\G<([:?])((?:[^:?]+|[:?][^>])*)/gcs or last;
   }

when I run this program I get a segfault because of a very deep recursion:

   #0  S_regmatch (prog=0x81252a8) at regexec.c:2237
   #1  0x080e605e in S_regmatch (prog=0x8125318) at regexec.c:3244
   #2  0x080e5068 in S_regmatch (prog=0x81252a8) at regexec.c:3789
   #3  0x080e605e in S_regmatch (prog=0x81252a8) at regexec.c:3244
   #4  0x080e605e in S_regmatch (prog=0x8125318) at regexec.c:3244
   #5  0x080e5068 in S_regmatch (prog=0x81252a8) at regexec.c:3789
   #6  0x080e605e in S_regmatch (prog=0x81252a8) at regexec.c:3244
   ...
   #18941 0x080e5068 in S_regmatch (prog=0x81252a8) at regexec.c:3789
   #18942 0x080e605e in S_regmatch (prog=0x8125318) at regexec.c:3244
   #18943 0x080e5cef in S_regmatch (prog=0x8125250) at regexec.c:3079
   #18944 0x080e2c4f in S_regtry (prog=0x812520c, startpos=0x8431819 "?>\n<!DOCTYPE PLAY SYSTEM \"play.dtd\">\n\n<PLAY>\n<TITLE>The Tragedy of Macbeth</TITLE>\n\n<FM>\n<P>Text placed in th

I don't know if this is a bug or the document is simply too long to be
matched by regex (which might be suboptimal, although I tried to make it
perform ok. In case you wonde,r it is used to match "<: code :>" sections
inside some other text (optionally "<? code ?>" as well). The first regex
matches ":>literal..." and the second regex matches "<:code...". The test
document contains a single ":>" at the beginning and then a long XML text.

Even if this problem is caused by a bad regex, I don't think it should end
up in such a big recursion. In addition, I would expect the regex:

   \G ( [:?]) > ( (?: [^<]+ | < [^:?]) * )

to be quite linear without much recursion (many regexes optimized for
speed have this form, I think), which gave me enough strength to file this
as a bug report ;)


[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=medium
---
Site configuration information for perl v5.8.1:

Configured by root at Sat Mar 22 16:16:52 CET 2003.

Summary of my perl5 (revision 5.0 version 8 subversion 1 patch 19045) configuration:
  Platform:
    osname=linux, osvers=2.4, archname=i686-linux
    uname='linux cerebro 2.4.18-pre8-ac3 #2 smp tue feb 5 17:35:23 cet 2002 i686 unknown '
    config_args=''
    hint=previous, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-I/opt/include -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-Os -funroll-loops -mcpu=pentium -march=pentium -g',
    cppflags='-I/opt/include -D_GNU_SOURCE -I/opt/include -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/opt/include -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    ccversion='', gccversion='3.2.3 20030316 (Debian prerelease)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =''
    libpth=/usr/lib /opt/lib
    libs=-lcrypt -ldl -lm -lc
    perllibs=-lcrypt -ldl -lm -lc
    libc=/lib/libc-2.2.5.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared'

Locally applied patches:
    MAINT19040

---
@INC for perl v5.8.1:
    /root/src/sex
    /opt/perl/lib/perl5
    /opt/perl/lib/perl5
    /opt/perl/lib/perl5
    /opt/perl/lib/perl5
    .

---
Environment for perl v5.8.1:
    HOME=/root
    LANG (unset)
    LANGUAGE (unset)
    LC_CTYPE=de_DE@euro
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/root/s2:/root/s:/opt/qt/bin:/opt/bin:/opt/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/X11/bin:/usr/games:/usr/local/bin:/usr/local/sbin:.:/root/cc/dejagnu/bin
    PERL5LIB=/root/src/sex
    PERL5_CPANPLUS_CONFIG=/root/.cpanplus/config
    PERLDB_OPTS=ornaments=0
    PERL_BADLANG (unset)
    SHELL=/bin/bash




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About