develooper Front page | perl.perl5.porters | Postings from January 2004

Re: [perl #24936] severe regexp performance problem with perl 5.8.*

Thread Previous | Thread Next
From:
Abigail
Date:
January 21, 2004 14:44
Subject:
Re: [perl #24936] severe regexp performance problem with perl 5.8.*
Message ID:
20040121224420.GI393@abigail.nl
On Sun, Jan 18, 2004 at 01:11:12AM -0000, kai+perl@conti.nu (via RT) wrote:
> # New Ticket Created by  kai+perl@conti.nu 
> # Please include the string:  [perl #24936]
> # in the subject line of all future correspondence about this issue. 
> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=24936 >
> 
> 
> 
> This is a bug report for perl from kai+perl@conti.nu,
> generated with the help of perlbug 1.34 running under perl v5.8.3.
> 
> 
> -----------------------------------------------------------------
> [Please enter your report here]
> 
> A severe regexp performance problem seems to exist in perl 5.8.*
> Platforms this was reproduced on:
> - FBSD-i386 4.8R , perl 5.8.0 on Pentium-M/1.4Ghz under VMware Workstation 4.0.5
>   stock install of Perl under this distribution of FBSD.
> - BSD/OS 4.1 (BSDI), perl 5.8.2 and 5.8.3 on Celereon/533Mhz and PIII/550Mhz
>   default install of perl 5.8.2 and 5.8.3 per INSTALL file.
> 
> While upgrading from perl 5.005p3 to 5.8.2, some existing applications
> seemed to take a severe performance hit in their central loops
> containing a number of regexps.
> 
> Upon closer examination, it was determined that certain regexp's using
> ".*" constructs seem to execute more than 100 times slower than in
> perl 5.005p3, resulting in multiple cascading failures in these applications.
> 
> Example:
> 
> $line = 'Jan 16 15:56:37 sonet sendmail[9368]: i0CGl7a1015852: to=<abuse@tiscali.be>,<abuse@tiscalinet.be>, ctladdr=<spamshield@conti.nu> (100/101), delay=4+04:09:29, xdelay=00:00:00, mailer=esmtp, pri=17436067, relay=mailer.tiscali.be., dsn=4.0.0,stat=Deferred: mailer.tiscali.be.: Network is unreachable';
> 
> regexps executed performing at a ridiculously slow pace:
> A) $line =~ /(?i).*(dable|z).*$/ ; # needs 33 ms to execute! 29 loop runs/s
>    (note how the alternate strings 'dable' and 'z' do not occur in $line)
> B) $line =~ /(?i).*?(dable|z).*?$/ ; # needs 33 ms to execute! 29 loop runs/s
> C) $line =~ /(?i).*?(?:dable|z).*?$/ ; # needs 33 ms to execute! 29 loop runs/s
> 
> compare to:
> D) $line =~ /(?i).*(able|z).*$/ ; # needs 0.07 ms to execute, 2700 loop runs/sec.
> E) $line =~ /(?i).*(dable|z)$/  ; # needs 0.2 ms to execute, 1500 loop runs/sec.
> F) $line =~ /(?i)(dable|z).*$/  ; # needs 0.2 ms to execute, 1500 loop runs/sec.
> 
> While leading .* is redundant at most, the bahaviour gets outright bizarre
> given the performance of A) through C) being strongly dependent on the length
> of $line and whether the ()-enclosed alternate strings exist or not.
> 
> Also noteworthy: Why is D) performing so much better than E) and F) ?


Because D is a match. For D, the regex engine will try to match 'able|e'
starting from the end. Since the string ends at 'able', the fact the 
string matches is determined quickly. 

Here's a program that shows there's a huge slowdown between 5.005_03
and 5.6.0 and it even gets slower after 5.6.0.


#!/usr/bin/perl -w

use strict;

my @versions = qw /5.005 5.005_01 5.005_02 5.005_03
                   5.6.0 5.6.1    5.6.2
                   5.8.0 5.8.1    5.8.2    5.8.3
                   5.9.0/;

my $prog = <<'--';

use Benchmark qw /timeit timestr/;
use vars qw /$line/;

my $count = shift || 100;

$line = 'Jan 16 15:56:37 sonet sendmail[9368]: i0CGl7a1015852: '      .
        'to=<abuse@tiscali.be>,<abuse@tiscalinet.be>, '               .
        'ctladdr=<spamshield@conti.nu> (100/101), delay=4+04:09:29, ' .
        'xdelay=00:00:00, mailer=esmtp, pri=17436067, '               .
        'relay=mailer.tiscali.be., dsn=4.0.0,stat=Deferred: '         .
        'mailer.tiscali.be.: Network is unreachable';

my @regexes = (
    [A => '(?i).*(dable|z).*$'],
    [B => '(?i).*?(dable|z).*?$'],
    [C => '(?i).*?(?:dable|z).*?$'],
    [D => '(?i).*(able|z).*$'],
    [E => '(?i).*(dable|z)$'],
    [F => '(?i)(dable|z).*$'],
);


foreach my $entry (@regexes) {
    my ($name, $re) = @$entry;
    my $t = timeit $count => "\$::line =~ /$re/";
    my $run = $$t [1] + $$t [2];
    my $avg = sprintf "%.2f" => 1_000 * $run / $count;
    print "\t$name takes $avg msecs. The match was ";
    print $line =~ /$re/ ? "succesful.\n" : "not succesful.\n";
}

--

foreach my $version (@versions) {
    print "$version:\n";
    system "/opt/perl/$version/bin/perl" => '-e', $prog;
}

__END__
5.005:
	A takes 0.10 msecs. The match was not succesful.
	B takes 0.10 msecs. The match was not succesful.
	C takes 0.10 msecs. The match was not succesful.
	D takes 0.00 msecs. The match was succesful.
	E takes 0.10 msecs. The match was not succesful.
	F takes 0.10 msecs. The match was not succesful.
5.005_01:
	A takes 0.10 msecs. The match was not succesful.
	B takes 0.10 msecs. The match was not succesful.
	C takes 0.10 msecs. The match was not succesful.
	D takes 0.00 msecs. The match was succesful.
	E takes 0.10 msecs. The match was not succesful.
	F takes 0.10 msecs. The match was not succesful.
5.005_02:
	A takes 0.10 msecs. The match was not succesful.
	B takes 0.10 msecs. The match was not succesful.
	C takes 0.10 msecs. The match was not succesful.
	D takes 0.00 msecs. The match was succesful.
	E takes 0.10 msecs. The match was not succesful.
	F takes 0.10 msecs. The match was not succesful.
5.005_03:
	A takes 0.10 msecs. The match was not succesful.
	B takes 0.10 msecs. The match was not succesful.
	C takes 0.10 msecs. The match was not succesful.
	D takes 0.00 msecs. The match was succesful.
	E takes 0.10 msecs. The match was not succesful.
	F takes 0.10 msecs. The match was not succesful.
5.6.0:
	A takes 12.50 msecs. The match was not succesful.
	B takes 12.90 msecs. The match was not succesful.
	C takes 12.20 msecs. The match was not succesful.
	D takes 0.00 msecs. The match was succesful.
	E takes 0.20 msecs. The match was not succesful.
	F takes 0.10 msecs. The match was not succesful.
5.6.1:
	A takes 16.90 msecs. The match was not succesful.
	B takes 17.70 msecs. The match was not succesful.
	C takes 15.40 msecs. The match was not succesful.
	D takes 0.00 msecs. The match was succesful.
	E takes 0.20 msecs. The match was not succesful.
	F takes 0.10 msecs. The match was not succesful.
5.6.2:
	A takes 16.50 msecs. The match was not succesful.
	B takes 17.00 msecs. The match was not succesful.
	C takes 14.60 msecs. The match was not succesful.
	D takes 0.00 msecs. The match was succesful.
	E takes 0.20 msecs. The match was not succesful.
	F takes 0.10 msecs. The match was not succesful.
5.8.0:
	A takes 21.10 msecs. The match was not succesful.
	B takes 20.10 msecs. The match was not succesful.
	C takes 19.90 msecs. The match was not succesful.
	D takes 0.00 msecs. The match was succesful.
	E takes 0.30 msecs. The match was not succesful.
	F takes 0.10 msecs. The match was not succesful.
5.8.1:
	A takes 19.50 msecs. The match was not succesful.
	B takes 20.20 msecs. The match was not succesful.
	C takes 18.70 msecs. The match was not succesful.
	D takes 0.00 msecs. The match was succesful.
	E takes 0.30 msecs. The match was not succesful.
	F takes 0.10 msecs. The match was not succesful.
5.8.2:
	A takes 19.90 msecs. The match was not succesful.
	B takes 19.70 msecs. The match was not succesful.
	C takes 18.20 msecs. The match was not succesful.
	D takes 0.00 msecs. The match was succesful.
	E takes 0.30 msecs. The match was not succesful.
	F takes 0.10 msecs. The match was not succesful.
5.8.3:
	A takes 19.70 msecs. The match was not succesful.
	B takes 19.40 msecs. The match was not succesful.
	C takes 18.40 msecs. The match was not succesful.
	D takes 0.00 msecs. The match was succesful.
	E takes 0.30 msecs. The match was not succesful.
	F takes 0.10 msecs. The match was not succesful.
5.9.0:
	A takes 19.10 msecs. The match was not succesful.
	B takes 19.80 msecs. The match was not succesful.
	C takes 17.80 msecs. The match was not succesful.
	D takes 0.00 msecs. The match was succesful.
	E takes 0.30 msecs. The match was not succesful.
	F takes 0.10 msecs. The match was not succesful.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About