Front page | perl.perl5.porters |
Postings from January 2004
[perl #24936] severe regexp performance problem with perl 5.8.*
Thread Next
From:
perlbug-followup
Date:
January 18, 2004 17:18
Subject:
[perl #24936] severe regexp performance problem with perl 5.8.*
Message ID:
rt-3.0.8-24936-70125.0.564342159163189@perl.org
# New Ticket Created by kai+perl@conti.nu
# Please include the string: [perl #24936]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=24936 >
This is a bug report for perl from kai+perl@conti.nu,
generated with the help of perlbug 1.34 running under perl v5.8.3.
-----------------------------------------------------------------
[Please enter your report here]
A severe regexp performance problem seems to exist in perl 5.8.*
Platforms this was reproduced on:
- FBSD-i386 4.8R , perl 5.8.0 on Pentium-M/1.4Ghz under VMware Workstation 4.0.5
stock install of Perl under this distribution of FBSD.
- BSD/OS 4.1 (BSDI), perl 5.8.2 and 5.8.3 on Celereon/533Mhz and PIII/550Mhz
default install of perl 5.8.2 and 5.8.3 per INSTALL file.
While upgrading from perl 5.005p3 to 5.8.2, some existing applications
seemed to take a severe performance hit in their central loops
containing a number of regexps.
Upon closer examination, it was determined that certain regexp's using
".*" constructs seem to execute more than 100 times slower than in
perl 5.005p3, resulting in multiple cascading failures in these applications.
Example:
$line = 'Jan 16 15:56:37 sonet sendmail[9368]: i0CGl7a1015852: to=<abuse@tiscali.be>,<abuse@tiscalinet.be>, ctladdr=<spamshield@conti.nu> (100/101), delay=4+04:09:29, xdelay=00:00:00, mailer=esmtp, pri=17436067, relay=mailer.tiscali.be., dsn=4.0.0,stat=Deferred: mailer.tiscali.be.: Network is unreachable';
regexps executed performing at a ridiculously slow pace:
A) $line =~ /(?i).*(dable|z).*$/ ; # needs 33 ms to execute! 29 loop runs/s
(note how the alternate strings 'dable' and 'z' do not occur in $line)
B) $line =~ /(?i).*?(dable|z).*?$/ ; # needs 33 ms to execute! 29 loop runs/s
C) $line =~ /(?i).*?(?:dable|z).*?$/ ; # needs 33 ms to execute! 29 loop runs/s
compare to:
D) $line =~ /(?i).*(able|z).*$/ ; # needs 0.07 ms to execute, 2700 loop runs/sec.
E) $line =~ /(?i).*(dable|z)$/ ; # needs 0.2 ms to execute, 1500 loop runs/sec.
F) $line =~ /(?i)(dable|z).*$/ ; # needs 0.2 ms to execute, 1500 loop runs/sec.
While leading .* is redundant at most, the bahaviour gets outright bizarre
given the performance of A) through C) being strongly dependent on the length
of $line and whether the ()-enclosed alternate strings exist or not.
Also noteworthy: Why is D) performing so much better than E) and F) ?
We can safely assume that LOTS of less than perfectly designed regexps
exist in the field, matching the above.
Thanks for your consideration,
Kai Schlichting
kai+perl@conti.nu
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=critical
---
Site configuration information for perl v5.8.3:
Configured by kai at Thu Jan 15 23:51:39 EST 2004.
Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration:
Platform:
osname=bsdos, osvers=4.1, archname=i386-bsdos
uname='bsdos sonet.conti.nu 4.1 bsdi bsdos 4.1 kernel #8: fri sep 5 11:44:24 edt 2003 root@sonet.conti.nu:usrsrcsyscompilesonet i386 '
config_args='-de'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-fno-strict-aliasing -I/usr/local/include',
optimize='-O2',
cppflags='-fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='egcs-2.91.66 19990314 (egcs-1.1.2 release)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='ld', ldflags =' -L/usr/X11/lib -L/usr/local/lib'
libpth=/usr/local/lib /usr/shlib /shlib /lib /usr/lib /usr/X11/lib
libs=-lutil -lbind -ldl -lm -lc
perllibs=-lutil -lbind -ldl -lm -lc
libc=/shlib/libc.so, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' -Wl,-rpath,/usr/local/lib/perl5/5.8.3/i386-bsdos/CORE'
cccdlflags='-fPIC', lddlflags='-shared -x -L/usr/X11/lib -L/usr/local/lib'
Locally applied patches:
---
@INC for perl v5.8.3:
/usr/local/lib/perl5/5.8.3/i386-bsdos
/usr/local/lib/perl5/5.8.3
/usr/local/lib/perl5/site_perl/5.8.3/i386-bsdos
/usr/local/lib/perl5/site_perl/5.8.3
/usr/local/lib/perl5/site_perl/5.8.2/i386-bsdos
/usr/local/lib/perl5/site_perl/5.8.2
/usr/local/lib/perl5/site_perl
.
---
Environment for perl v5.8.3:
HOME=/usr/home/kai
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/bin:/usr/bin:/usr/contrib/bin:/usr/X11/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/X11/bin:/usr/local/bin:/usr/TeX/bin:/bin:/usr/games:/usr/contrib/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/X11/bin:/usr/local/bin:/usr/TeX/bin:/bin:/usr/games
PERL_BADLANG (unset)
SHELL=/bin/bash
Thread Next
-
[perl #24936] severe regexp performance problem with perl 5.8.*
by perlbug-followup