develooper Front page | perl.perl5.porters | Postings from January 2006

[perl #38379] Segmentation fault for matching too long regexps

From:
Lukasz Debowski
Date:
January 31, 2006 05:18
Subject:
[perl #38379] Segmentation fault for matching too long regexps
Message ID:
rt-3.0.11-38379-129036.1.76245616450075@perl.org
# New Ticket Created by  Lukasz Debowski 
# Please include the string:  [perl #38379]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=38379 >



This is a bug report for perl from ldebowsk@ipipan.waw.pl,
generated with the help of perlbug 1.35 running under perl v5.8.4.


-----------------------------------------------------------------
[Please enter your report here]

Dear Recipients,

I observed that Perl regular expression matching operator produces
segmentation fault when trying to match a too long expression.  I
consider it a bug since no information is given about the cause of the
fault and its location in the script. Compare running two simple scripts:

======= SCRIPT #1 =======

ldebowsk@mises:~$ perl -e '$_="<w>ab"; while(1){ $_=$_."<11>1>a>b>1>c>d";
if(/^<w>[^<>]+(<[01][01](>[1-9][0-9]*>[^><]+>[^><]+)+)+$/){print "$k\n";}
$k++;}'
1
2
...
1903
1904
Naruszenie ochrony pamięci (i.e. "Segmentation fault" in Polish)
ldebowsk@mises:~$

======= SCRIPT #2 =======

ldebowsk@mises:~$ perl -e "0/0;"
Illegal division by zero at -e line 1.
ldebowsk@mises:~$

=========================

I think that it would be much nicer if for script #1 a regular
Perl error message be produced. For example, "Exceeding run-time
memory when matching REGEXP at -e line 1".

It took me too much time to find out that the too long match is the
cause of the segmentation fault. I came across this behavior when
using two scripts for natural language processing. The first one was
producing a kind of part-of-speech annotated corpus out of a plain
text and the second one was validating the format of the corpus. The
matched expression in the validating script was exactly as in the
if-condition of script #1. For typical language data, the matched
expression consisted of <100 consecutive segments of type
(>[1-9][0-9]*>[^><]+>[^><]+)+), but when I ran the part-of-speech
annotating script on some weird data it produced unexpectedly a string
consisting of >1000 consecutive segments of type
(>[1-9][0-9]*>[^><]+>[^><]+)+).

Kind regards,

Lukasz Debowski
ldebowsk@ipipan.waw.pl

*** www.ipipan.waw.pl/~ldebowsk ***


[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
     category=core
     severity=low
---
Site configuration information for perl v5.8.4:

Configured by Debian Project at Tue Mar  8 20:31:23 EST 2005.

Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
   Platform:
     osname=linux, osvers=2.4.27-ti1211, archname=i386-linux-thread-multi
     uname='linux kosh 2.4.27-ti1211 #1 sun sep 19 18:17:45 est 2004 i686 
gnulinux '
     config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN 
-Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr 
-Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 
-Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local 
-Dsitelib=/usr/local/share/perl/5.8.4 -Dsitearch=/usr/local/lib/perl/5.8.4 
-Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 
-Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 
-Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh 
-Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.4 -Dd_dosuid -des'
     hint=recommended, useposix=true, d_sigaction=define
     usethreads=define use5005threads=undef useithreads=define 
usemultiplicity=define
     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
     use64bitint=undef use64bitall=undef uselongdouble=undef
     usemymalloc=n, bincompat5005=undef
   Compiler:
     cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS 
-DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64',
     optimize='-O2',
     cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN 
-fno-strict-aliasing -I/usr/local/include'
     ccversion='', gccversion='3.3.5 (Debian 1:3.3.5-9)', gccosandvers=''
     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
     alignbytes=4, prototype=define
   Linker and Libraries:
     ld='cc', ldflags =' -L/usr/local/lib'
     libpth=/usr/local/lib /lib /usr/lib
     libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
     perllibs=-ldl -lm -lpthread -lc -lcrypt
     libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, 
libperl=libperl.so.5.8.4
     gnulibc_version='2.3.2'
   Dynamic Linking:
     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
     cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:


---
@INC for perl v5.8.4:
     /home/ldebowsk/PM2M//share
     /home/ldebowsk/PM2M//guess
     /home/ldebowsk/perl
     /etc/perl
     /usr/local/lib/perl/5.8.4
     /usr/local/share/perl/5.8.4
     /usr/lib/perl5
     /usr/share/perl5
     /usr/lib/perl/5.8
     /usr/share/perl/5.8
     /usr/local/lib/site_perl
     .

---
Environment for perl v5.8.4:
     HOME=/home/ldebowsk
     LANG=pl_PL
     LANGUAGE=pl_PL:pl:en_GB:en
     LD_LIBRARY_PATH (unset)
     LOGDIR (unset)

PATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/home/ldebowsk/:/home/ldebowsk/123/:/home/ldebowsk/perl/convert/:/home/ldebowsk/perl/scripts_QL/TypeToken/:/home/ldebowsk/perl/scripts_QL/cfg/:/home/ldebowsk/perl/scripts_QL/extract/:/home/ldebowsk/PM2M//admin_exes:/home/ldebowsk/PM2M//tagging_exes:/home/ldebowsk/PM2M//training_exes:/usr/local/j2sdk1.4.2_03/bin:/home/ldebowsk/my_cvs//fcqp/text/wakiki/bp/:/home/ldebowsk/my_cvs//fcqp/text/wakiki/shell/:/home/ldebowsk/my_cvs//fcqp/gui/src/

PERLLIB=:/home/ldebowsk/PM2M//share:/home/ldebowsk/PM2M//guess:/home/ldebowsk/perl
     PERL_BADLANG (unset)
     SHELL=/bin/bash





nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About