develooper Front page | perl.perl5.porters | Postings from September 2011

[perl #99870] Capturing matches in regex against large strings (8MB) are slow

Thread Next
From:
Matthew Horsfall
Date:
September 24, 2011 16:48
Subject:
[perl #99870] Capturing matches in regex against large strings (8MB) are slow
Message ID:
rt-3.6.HEAD-31297-1316804241-1094.99870-75-0@perl.org
# New Ticket Created by  Matthew Horsfall 
# Please include the string:  [perl #99870]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=99870 >


This is a bug report for perl from wolfsage@gmail.com,
generated with the help of perlbug 1.39 running under perl 5.10.1.


-----------------------------------------------------------------
Capturing regexps against large strings (Tested with an 8MB string) are
significantly slow.

Below is an example script that shows the time differences for 4 different
10k loops over capturing
and non-capturing matches against a 1k string and a 8MB string. The
capture/match is quite simple and should take no time at all.

Example (also found here: https://gist.github.com/1238022)

#!/usr/bin/perl

use strict;
use warnings;

$|++;

use Time::HiRes qw(gettimeofday);

my $start;
my $end;

my $string = 'x' x (1024);
my $string2 = 'x' x (1024 * 1024 * 8);

print "Non-capture match against 1k: ";
$start = gettimeofday;
for (1..10_000) {
        $string =~ /^x/;
}
$end = gettimeofday;
printf("%.02f seconds\n", $end - $start);

print "Non-capture match against 8MB bytes: ";
$start = gettimeofday;
for (1..10_000) {
        $string2 =~ /^x/;
}
$end = gettimeofday;
printf("%.02f seconds\n", $end - $start);

print "Capture match against 1024 bytes: ";
$start = gettimeofday;
for (1..10_000) {
        $string =~ /^(x)/;
}
$end = gettimeofday;
printf("%.02f seconds\n", $end - $start);

print "Capture match against 8MB bytes: ";
$start = gettimeofday;
for (1..10_000) {
        $string2 =~ /^(x)/;
}
$end = gettimeofday;
printf("%.02f seconds\n", $end - $start);

__END__

mhorsfall@darmstadtium:~$ ./woah.pl
Non-capture match against 1k: 0.00 seconds
Non-capture match against 8MB bytes: 0.00 seconds
Capture match against 1024 bytes: 0.00 seconds
Capture match against 8MB bytes: 27.11 seconds


[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=low
---
Site configuration information for perl 5.10.1:

Configured by Debian Project at Tue Apr 26 15:53:22 UTC 2011.

Summary of my perl5 (revision 5 version 10 subversion 1) configuration:

  Platform:
    osname=linux, osvers=2.6.24-29-server,
archname=x86_64-linux-gnu-thread-multi
    uname='linux crested 2.6.24-29-server #1 smp wed mar 16 19:04:28 utc
2011 x86_64 x86_64 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN
-Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr
-Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5
-Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1
-Dsitearch=/usr/local/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1
-Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1
-Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl
-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm
-DDEBUGGING=-g -Doptimize=-O2 -Dplibpth=/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu -Duseshrplib -Dlibperl=libperl.so.5.10.1
-Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe
-fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.5.2', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu
/lib /usr/lib /lib64 /usr/lib64
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=, so=so, useshrplib=true, libperl=libperl.so.5.10.1
    gnulibc_version='2.13'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib
-fstack-protector'

Locally applied patches:
    DEBPKG:debian/arm_thread_stress_timeout -
http://bugs.debian.org/501970Raise the timeout of
ext/threads/shared/t/stress.t to accommodate slower
build hosts
    DEBPKG:debian/cpan_config_path - Set location of CPAN::Config to
/etc/perl as /usr may not be writable.
    DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS
default for modules installed from CPAN.
    DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly
restrictive DB_File version check.
    DEBPKG:debian/doc_info - Replace generic man(1) instructions with
Debian-specific information.
    DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to
follow symlinks and ignore missing @INC directories.
    DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno
version check due to upgrade problems with long-running processes.
    DEBPKG:debian/extutils_hacks - Various debian-specific ExtUtils changes
    DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the
binary targets.
    DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist
files for core or vendor.
    DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per
Debian policy.
    DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to
/etc/perl/Net as /usr may not be writable.
    DEBPKG:debian/m68k_thread_stress - http://bugs.debian.org/495826 Disable
some threads tests on m68k for now due to missing TLS.
    DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
    DEBPKG:debian/module_build_man_extensions -
http://bugs.debian.org/479460 Adjust Module::Build manual page extensions
for the Debian Perl policy
    DEBPKG:debian/perl_synopsis - http://bugs.debian.org/278323 Rearrange
perl.pod
    DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list
of libraries wanted to what we actually need.
    DEBPKG:debian/use_gdbm - Explicitly link against -lgdbm_compat in
ODBM_File/NDBM_File.
    DEBPKG:fixes/assorted_docs - http://bugs.debian.org/443733 [384f06a]
Math::BigInt::CalcEmu documentation grammar fix
    DEBPKG:fixes/net_smtp_docs - http://bugs.debian.org/100195
[rt.cpan.org#36038] Document the Net::SMTP 'Port' option
    DEBPKG:fixes/processPL - http://bugs.debian.org/357264
[rt.cpan.org#17224] Always use PERLRUNINST when building perl modules.
    DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip
include directories in /usr/local
    DEBPKG:fixes/pod2man-index-backslash -
http://bugs.debian.org/521256Escape backslashes in .IX entries
    DEBPKG:debian/disable-zlib-bundling - Disable zlib bundling in
Compress::Raw::Zlib
    DEBPKG:fixes/kfreebsd_cppsymbols -
http://bugs.debian.org/533098[3b910a0] Add gcc predefined macros to
$Config{cppsymbols} on GNU/kFreeBSD.
    DEBPKG:debian/cpanplus_definstalldirs -
http://bugs.debian.org/533707Configure CPANPLUS to use the site
directories by default.
    DEBPKG:debian/cpanplus_config_path - Save local versions of
CPANPLUS::Config::System into /etc/perl.
    DEBPKG:fixes/kfreebsd-filecopy-pipes -
http://bugs.debian.org/537555[16f708c] Fix File::Copy::copy with pipes
on GNU/kFreeBSD
    DEBPKG:fixes/anon-tmpfile-dir - http://bugs.debian.org/528544 [perl
#66452] Honor TMPDIR when open()ing an anonymous temporary file
    DEBPKG:fixes/abstract-sockets - http://bugs.debian.org/329291 [89904c0]
Add support for Abstract namespace sockets.
    DEBPKG:fixes/hurd_cppsymbols - http://bugs.debian.org/544307 [eeb92b7]
Add gcc predefined macros to $Config{cppsymbols} on GNU/Hurd.
    DEBPKG:fixes/autodie-flock - http://bugs.debian.org/543731 Allow for
flock returning EAGAIN instead of EWOULDBLOCK on linux/parisc
    DEBPKG:fixes/archive-tar-instance-error - http://bugs.debian.org/539355[
rt.cpan.org #48879] Separate Archive::Tar instance error strings from each
other
    DEBPKG:fixes/positive-gpos - http://bugs.debian.org/545234 [perl #69056]
[c584a96] Fix \\G crash on first match
    DEBPKG:debian/devel-ppport-ia64-optim -
http://bugs.debian.org/548943Work around an ICE on ia64
    DEBPKG:fixes/trie-logic-match - http://bugs.debian.org/552291 [perl
#69973] [0abd0d7] Fix a DoS in Unicode processing [CVE-2009-3626]
    DEBPKG:fixes/hppa-thread-eagain - http://bugs.debian.org/554218 make the
threads-shared test suite more robust, fixing failures on hppa
    DEBPKG:fixes/crash-on-undefined-destroy -
http://bugs.debian.org/564074[perl #71952] [1f15e67] Fix a NULL
pointer dereference when looking for a
DESTROY method
    DEBPKG:fixes/tainted-errno - http://bugs.debian.org/574129 [perl #61976]
[be1cf43] fix an errno stringification bug in taint mode
    DEBPKG:fixes/safe-upgrade - http://bugs.debian.org/582978 Upgrade
Safe.pm to 2.25, fixing CVE-2010-1974
    DEBPKG:fixes/tell-crash - http://bugs.debian.org/578577 [f4817f3] Fix a
tell() crash on bad arguments.
    DEBPKG:fixes/format-write-crash - http://bugs.debian.org/579537 [perl
#22977] [421f30e] Fix a crash in format/write
    DEBPKG:fixes/arm-alignment - http://bugs.debian.org/289884 [f1c7503]
Prevent gcc from optimizing the alignment test away on armel
    DEBPKG:fixes/fcgi-test - Fix a failure in CGI/t/fast.t when FCGI is
installed
    DEBPKG:fixes/hurd-ccflags - http://bugs.debian.org/587901 Make
hints/gnu.sh append to $ccflags rather than overriding them
    DEBPKG:debian/squelch-locale-warnings -
http://bugs.debian.org/508764Squelch locale warnings in Debian package
maintainer scripts
    DEBPKG:fixes/lc-numeric-docs - http://bugs.debian.org/379329 [perl
#78452] [903eb63] LC_NUMERIC documentation fixes
    DEBPKG:fixes/lc-numeric-sprintf - http://bugs.debian.org/601549 [perl
#78632] [b3fd614] Fix sprintf not to ignore LC_NUMERIC with constants
    DEBPKG:fixes/concat-stack-corruption -
http://bugs.debian.org/596105[perl #78674] [e3393f5] Fix stack pointer
corruption in pp_concat() with
'use encoding'
    DEBPKG:fixes/cgi-multiline-header -
http://bugs.debian.org/606995[CVE-2010-2761 CVE-2010-4410
CVE-2010-4411] CGI.pm MIME boundary and
multiline header vulnerabilities
    DEBPKG:fixes/h2ph-gcc-4.5 - http://bugs.debian.org/599933 [8d66b3f] Fix
h2ph and test
    DEBPKG:fixes/threads-tmps-crash - [perl #70411] [24855df] Conditionally
compile tmps stack cleanup code
    DEBPKG:patchlevel - http://bugs.debian.org/567489 List packaged patches
for 5.10.1-17ubuntu1 in patchlevel.h

---
@INC for perl 5.10.1:
    /etc/perl
    /usr/local/lib/perl/5.10.1
    /usr/local/share/perl/5.10.1
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.10
    /usr/share/perl/5.10
    /usr/local/lib/site_perl
    .

---
Environment for perl 5.10.1:
    HOME=/home/mhorsfall
    LANG=C
    LANGUAGE=en_US:en
    LC_MESSAGES=en_US.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)

PATH=/home/mhorsfall/bin:/home/mhorsfall/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
    PERL_BADLANG (unset)
    SHELL=/bin/bash

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About