Front page | perl.perl5.porters |
Postings from April 2003
[perl #22051] segfault (deep recursion?) in regex match
From:
Marc Lehmann
Date:
April 28, 2003 07:43
Subject:
[perl #22051] segfault (deep recursion?) in regex match
Message ID:
rt-22051-56081.3.1067383287423@bugs6.perl.org
# New Ticket Created by Marc Lehmann
# Please include the string: [perl #22051]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt2/Ticket/Display.html?id=22051 >
This is a bug report for perl from root@cerebro.laendle,
generated with the help of perlbug 1.34 running under perl v5.8.1.
-----------------------------------------------------------------
[Please enter your report here]
Due to a bug I once fed the wrong text into some regex and earned.... a
segfault in a function that seemd to segfault occasionally before but I
never found a good testcase.
This testcase isn't very good, either, because it seems to require a big
document that I put on my webserver so I didn't need to atatch it.
Here is the program that segfaults with both perl-5.8.0 from debian as
well as with my own perl-5.8.1 MAINT19040:
# just get the test data into $data
use LWP::Simple;
$data = get "http://data.plan9.de/macbeth.xml";
# the segfault occurs on the second round (i think) in the first regex.
for(;;) {
$data =~ /\G([:?])>((?:[^<]+|<[^:?])*)/xgcs or last;
$data =~ /\G<([:?])((?:[^:?]+|[:?][^>])*)/gcs or last;
}
when I run this program I get a segfault because of a very deep recursion:
#0 S_regmatch (prog=0x81252a8) at regexec.c:2237
#1 0x080e605e in S_regmatch (prog=0x8125318) at regexec.c:3244
#2 0x080e5068 in S_regmatch (prog=0x81252a8) at regexec.c:3789
#3 0x080e605e in S_regmatch (prog=0x81252a8) at regexec.c:3244
#4 0x080e605e in S_regmatch (prog=0x8125318) at regexec.c:3244
#5 0x080e5068 in S_regmatch (prog=0x81252a8) at regexec.c:3789
#6 0x080e605e in S_regmatch (prog=0x81252a8) at regexec.c:3244
...
#18941 0x080e5068 in S_regmatch (prog=0x81252a8) at regexec.c:3789
#18942 0x080e605e in S_regmatch (prog=0x8125318) at regexec.c:3244
#18943 0x080e5cef in S_regmatch (prog=0x8125250) at regexec.c:3079
#18944 0x080e2c4f in S_regtry (prog=0x812520c, startpos=0x8431819 "?>\n<!DOCTYPE PLAY SYSTEM \"play.dtd\">\n\n<PLAY>\n<TITLE>The Tragedy of Macbeth</TITLE>\n\n<FM>\n<P>Text placed in th
I don't know if this is a bug or the document is simply too long to be
matched by regex (which might be suboptimal, although I tried to make it
perform ok. In case you wonde,r it is used to match "<: code :>" sections
inside some other text (optionally "<? code ?>" as well). The first regex
matches ":>literal..." and the second regex matches "<:code...". The test
document contains a single ":>" at the beginning and then a long XML text.
Even if this problem is caused by a bad regex, I don't think it should end
up in such a big recursion. In addition, I would expect the regex:
\G ( [:?]) > ( (?: [^<]+ | < [^:?]) * )
to be quite linear without much recursion (many regexes optimized for
speed have this form, I think), which gave me enough strength to file this
as a bug report ;)
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=medium
---
Site configuration information for perl v5.8.1:
Configured by root at Sat Mar 22 16:16:52 CET 2003.
Summary of my perl5 (revision 5.0 version 8 subversion 1 patch 19045) configuration:
Platform:
osname=linux, osvers=2.4, archname=i686-linux
uname='linux cerebro 2.4.18-pre8-ac3 #2 smp tue feb 5 17:35:23 cet 2002 i686 unknown '
config_args=''
hint=previous, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=y, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-I/opt/include -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-Os -funroll-loops -mcpu=pentium -march=pentium -g',
cppflags='-I/opt/include -D_GNU_SOURCE -I/opt/include -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/opt/include -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
ccversion='', gccversion='3.2.3 20030316 (Debian prerelease)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='gcc', ldflags =''
libpth=/usr/lib /opt/lib
libs=-lcrypt -ldl -lm -lc
perllibs=-lcrypt -ldl -lm -lc
libc=/lib/libc-2.2.5.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.3.2'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fpic', lddlflags='-shared'
Locally applied patches:
MAINT19040
---
@INC for perl v5.8.1:
/root/src/sex
/opt/perl/lib/perl5
/opt/perl/lib/perl5
/opt/perl/lib/perl5
/opt/perl/lib/perl5
.
---
Environment for perl v5.8.1:
HOME=/root
LANG (unset)
LANGUAGE (unset)
LC_CTYPE=de_DE@euro
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/root/s2:/root/s:/opt/qt/bin:/opt/bin:/opt/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/X11/bin:/usr/games:/usr/local/bin:/usr/local/sbin:.:/root/cc/dejagnu/bin
PERL5LIB=/root/src/sex
PERL5_CPANPLUS_CONFIG=/root/.cpanplus/config
PERLDB_OPTS=ornaments=0
PERL_BADLANG (unset)
SHELL=/bin/bash
-
[perl #22051] segfault (deep recursion?) in regex match
by Marc Lehmann