develooper Front page | perl.perl5.porters | Postings from January 2004

[perl #24936] severe regexp performance problem with perl 5.8.*

Thread Next
From:
perlbug-followup
Date:
January 18, 2004 17:18
Subject:
[perl #24936] severe regexp performance problem with perl 5.8.*
Message ID:
rt-3.0.8-24936-70125.0.564342159163189@perl.org
# New Ticket Created by  kai+perl@conti.nu 
# Please include the string:  [perl #24936]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=24936 >



This is a bug report for perl from kai+perl@conti.nu,
generated with the help of perlbug 1.34 running under perl v5.8.3.


-----------------------------------------------------------------
[Please enter your report here]

A severe regexp performance problem seems to exist in perl 5.8.*
Platforms this was reproduced on:
- FBSD-i386 4.8R , perl 5.8.0 on Pentium-M/1.4Ghz under VMware Workstation 4.0.5
  stock install of Perl under this distribution of FBSD.
- BSD/OS 4.1 (BSDI), perl 5.8.2 and 5.8.3 on Celereon/533Mhz and PIII/550Mhz
  default install of perl 5.8.2 and 5.8.3 per INSTALL file.

While upgrading from perl 5.005p3 to 5.8.2, some existing applications
seemed to take a severe performance hit in their central loops
containing a number of regexps.

Upon closer examination, it was determined that certain regexp's using
".*" constructs seem to execute more than 100 times slower than in
perl 5.005p3, resulting in multiple cascading failures in these applications.

Example:

$line = 'Jan 16 15:56:37 sonet sendmail[9368]: i0CGl7a1015852: to=<abuse@tiscali.be>,<abuse@tiscalinet.be>, ctladdr=<spamshield@conti.nu> (100/101), delay=4+04:09:29, xdelay=00:00:00, mailer=esmtp, pri=17436067, relay=mailer.tiscali.be., dsn=4.0.0,stat=Deferred: mailer.tiscali.be.: Network is unreachable';

regexps executed performing at a ridiculously slow pace:
A) $line =~ /(?i).*(dable|z).*$/ ; # needs 33 ms to execute! 29 loop runs/s
   (note how the alternate strings 'dable' and 'z' do not occur in $line)
B) $line =~ /(?i).*?(dable|z).*?$/ ; # needs 33 ms to execute! 29 loop runs/s
C) $line =~ /(?i).*?(?:dable|z).*?$/ ; # needs 33 ms to execute! 29 loop runs/s

compare to:
D) $line =~ /(?i).*(able|z).*$/ ; # needs 0.07 ms to execute, 2700 loop runs/sec.
E) $line =~ /(?i).*(dable|z)$/  ; # needs 0.2 ms to execute, 1500 loop runs/sec.
F) $line =~ /(?i)(dable|z).*$/  ; # needs 0.2 ms to execute, 1500 loop runs/sec.

While leading .* is redundant at most, the bahaviour gets outright bizarre
given the performance of A) through C) being strongly dependent on the length
of $line and whether the ()-enclosed alternate strings exist or not.

Also noteworthy: Why is D) performing so much better than E) and F) ?

We can safely assume that LOTS of less than perfectly designed regexps
exist in the field, matching the above.

Thanks for your consideration,

Kai Schlichting
kai+perl@conti.nu
 


[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=critical
---
Site configuration information for perl v5.8.3:

Configured by kai at Thu Jan 15 23:51:39 EST 2004.

Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration:
  Platform:
    osname=bsdos, osvers=4.1, archname=i386-bsdos
    uname='bsdos sonet.conti.nu 4.1 bsdi bsdos 4.1 kernel #8: fri sep 5 11:44:24 edt 2003 root@sonet.conti.nu:usrsrcsyscompilesonet i386 '
    config_args='-de'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -I/usr/local/include',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='egcs-2.91.66 19990314 (egcs-1.1.2 release)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='ld', ldflags =' -L/usr/X11/lib -L/usr/local/lib'
    libpth=/usr/local/lib /usr/shlib /shlib /lib /usr/lib /usr/X11/lib
    libs=-lutil -lbind -ldl -lm -lc
    perllibs=-lutil -lbind -ldl -lm -lc
    libc=/shlib/libc.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='  -Wl,-rpath,/usr/local/lib/perl5/5.8.3/i386-bsdos/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -x  -L/usr/X11/lib -L/usr/local/lib'

Locally applied patches:
    

---
@INC for perl v5.8.3:
    /usr/local/lib/perl5/5.8.3/i386-bsdos
    /usr/local/lib/perl5/5.8.3
    /usr/local/lib/perl5/site_perl/5.8.3/i386-bsdos
    /usr/local/lib/perl5/site_perl/5.8.3
    /usr/local/lib/perl5/site_perl/5.8.2/i386-bsdos
    /usr/local/lib/perl5/site_perl/5.8.2
    /usr/local/lib/perl5/site_perl
    .

---
Environment for perl v5.8.3:
    HOME=/usr/home/kai
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/bin:/usr/bin:/usr/contrib/bin:/usr/X11/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/X11/bin:/usr/local/bin:/usr/TeX/bin:/bin:/usr/games:/usr/contrib/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/X11/bin:/usr/local/bin:/usr/TeX/bin:/bin:/usr/games
    PERL_BADLANG (unset)
    SHELL=/bin/bash

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About