develooper Front page | perl.perl5.porters | Postings from September 2011

[perl #99870] Capturing matches in regex against large strings (8MB) are slow

Thread Next
Matthew Horsfall
September 24, 2011 16:48
[perl #99870] Capturing matches in regex against large strings (8MB) are slow
Message ID:
# New Ticket Created by  Matthew Horsfall 
# Please include the string:  [perl #99870]
# in the subject line of all future correspondence about this issue. 
# <URL: >

This is a bug report for perl from,
generated with the help of perlbug 1.39 running under perl 5.10.1.

Capturing regexps against large strings (Tested with an 8MB string) are
significantly slow.

Below is an example script that shows the time differences for 4 different
10k loops over capturing
and non-capturing matches against a 1k string and a 8MB string. The
capture/match is quite simple and should take no time at all.

Example (also found here:


use strict;
use warnings;


use Time::HiRes qw(gettimeofday);

my $start;
my $end;

my $string = 'x' x (1024);
my $string2 = 'x' x (1024 * 1024 * 8);

print "Non-capture match against 1k: ";
$start = gettimeofday;
for (1..10_000) {
        $string =~ /^x/;
$end = gettimeofday;
printf("%.02f seconds\n", $end - $start);

print "Non-capture match against 8MB bytes: ";
$start = gettimeofday;
for (1..10_000) {
        $string2 =~ /^x/;
$end = gettimeofday;
printf("%.02f seconds\n", $end - $start);

print "Capture match against 1024 bytes: ";
$start = gettimeofday;
for (1..10_000) {
        $string =~ /^(x)/;
$end = gettimeofday;
printf("%.02f seconds\n", $end - $start);

print "Capture match against 8MB bytes: ";
$start = gettimeofday;
for (1..10_000) {
        $string2 =~ /^(x)/;
$end = gettimeofday;
printf("%.02f seconds\n", $end - $start);


mhorsfall@darmstadtium:~$ ./
Non-capture match against 1k: 0.00 seconds
Non-capture match against 8MB bytes: 0.00 seconds
Capture match against 1024 bytes: 0.00 seconds
Capture match against 8MB bytes: 27.11 seconds

[Please do not change anything below this line]
Site configuration information for perl 5.10.1:

Configured by Debian Project at Tue Apr 26 15:53:22 UTC 2011.

Summary of my perl5 (revision 5 version 10 subversion 1) configuration:

    osname=linux, osvers=2.6.24-29-server,
    uname='linux crested 2.6.24-29-server #1 smp wed mar 16 19:04:28 utc
2011 x86_64 x86_64 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN
-Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr
-Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5
-Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1
-Dsitearch=/usr/local/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1
-Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1
-Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl
-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm
-DDEBUGGING=-g -Doptimize=-O2 -Dplibpth=/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu -Duseshrplib
-Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe
-fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.5.2', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu
/lib /usr/lib /lib64 /usr/lib64
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=, so=so, useshrplib=true,
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib

Locally applied patches:
    DEBPKG:debian/arm_thread_stress_timeout - the timeout of
ext/threads/shared/t/stress.t to accommodate slower
build hosts
    DEBPKG:debian/cpan_config_path - Set location of CPAN::Config to
/etc/perl as /usr may not be writable.
    DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS
default for modules installed from CPAN.
    DEBPKG:debian/db_file_ver - Remove overly
restrictive DB_File version check.
    DEBPKG:debian/doc_info - Replace generic man(1) instructions with
Debian-specific information.
    DEBPKG:debian/enc2xs_inc - Tweak enc2xs to
follow symlinks and ignore missing @INC directories.
    DEBPKG:debian/errno_ver - Remove Errno
version check due to upgrade problems with long-running processes.
    DEBPKG:debian/extutils_hacks - Various debian-specific ExtUtils changes
    DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the
binary targets.
    DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist
files for core or vendor.
    DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per
Debian policy.
    DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to
/etc/perl/Net as /usr may not be writable.
    DEBPKG:debian/m68k_thread_stress - Disable
some threads tests on m68k for now due to missing TLS.
    DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
    DEBPKG:debian/module_build_man_extensions - Adjust Module::Build manual page extensions
for the Debian Perl policy
    DEBPKG:debian/perl_synopsis - Rearrange
    DEBPKG:debian/prune_libs - Prune the list
of libraries wanted to what we actually need.
    DEBPKG:debian/use_gdbm - Explicitly link against -lgdbm_compat in
    DEBPKG:fixes/assorted_docs - [384f06a]
Math::BigInt::CalcEmu documentation grammar fix
    DEBPKG:fixes/net_smtp_docs -
[] Document the Net::SMTP 'Port' option
    DEBPKG:fixes/processPL -
[] Always use PERLRUNINST when building perl modules.
    DEBPKG:debian/perlivp - Make perlivp skip
include directories in /usr/local
    DEBPKG:fixes/pod2man-index-backslash - backslashes in .IX entries
    DEBPKG:debian/disable-zlib-bundling - Disable zlib bundling in
    DEBPKG:fixes/kfreebsd_cppsymbols -[3b910a0] Add gcc predefined macros to
$Config{cppsymbols} on GNU/kFreeBSD.
    DEBPKG:debian/cpanplus_definstalldirs - CPANPLUS to use the site
directories by default.
    DEBPKG:debian/cpanplus_config_path - Save local versions of
CPANPLUS::Config::System into /etc/perl.
    DEBPKG:fixes/kfreebsd-filecopy-pipes -[16f708c] Fix File::Copy::copy with pipes
on GNU/kFreeBSD
    DEBPKG:fixes/anon-tmpfile-dir - [perl
#66452] Honor TMPDIR when open()ing an anonymous temporary file
    DEBPKG:fixes/abstract-sockets - [89904c0]
Add support for Abstract namespace sockets.
    DEBPKG:fixes/hurd_cppsymbols - [eeb92b7]
Add gcc predefined macros to $Config{cppsymbols} on GNU/Hurd.
    DEBPKG:fixes/autodie-flock - Allow for
flock returning EAGAIN instead of EWOULDBLOCK on linux/parisc
    DEBPKG:fixes/archive-tar-instance-error -[ #48879] Separate Archive::Tar instance error strings from each
    DEBPKG:fixes/positive-gpos - [perl #69056]
[c584a96] Fix \\G crash on first match
    DEBPKG:debian/devel-ppport-ia64-optim - around an ICE on ia64
    DEBPKG:fixes/trie-logic-match - [perl
#69973] [0abd0d7] Fix a DoS in Unicode processing [CVE-2009-3626]
    DEBPKG:fixes/hppa-thread-eagain - make the
threads-shared test suite more robust, fixing failures on hppa
    DEBPKG:fixes/crash-on-undefined-destroy -[perl #71952] [1f15e67] Fix a NULL
pointer dereference when looking for a
DESTROY method
    DEBPKG:fixes/tainted-errno - [perl #61976]
[be1cf43] fix an errno stringification bug in taint mode
    DEBPKG:fixes/safe-upgrade - Upgrade to 2.25, fixing CVE-2010-1974
    DEBPKG:fixes/tell-crash - [f4817f3] Fix a
tell() crash on bad arguments.
    DEBPKG:fixes/format-write-crash - [perl
#22977] [421f30e] Fix a crash in format/write
    DEBPKG:fixes/arm-alignment - [f1c7503]
Prevent gcc from optimizing the alignment test away on armel
    DEBPKG:fixes/fcgi-test - Fix a failure in CGI/t/fast.t when FCGI is
    DEBPKG:fixes/hurd-ccflags - Make
hints/ append to $ccflags rather than overriding them
    DEBPKG:debian/squelch-locale-warnings - locale warnings in Debian package
maintainer scripts
    DEBPKG:fixes/lc-numeric-docs - [perl
#78452] [903eb63] LC_NUMERIC documentation fixes
    DEBPKG:fixes/lc-numeric-sprintf - [perl
#78632] [b3fd614] Fix sprintf not to ignore LC_NUMERIC with constants
    DEBPKG:fixes/concat-stack-corruption -[perl #78674] [e3393f5] Fix stack pointer
corruption in pp_concat() with
'use encoding'
    DEBPKG:fixes/cgi-multiline-header -[CVE-2010-2761 CVE-2010-4410
CVE-2010-4411] MIME boundary and
multiline header vulnerabilities
    DEBPKG:fixes/h2ph-gcc-4.5 - [8d66b3f] Fix
h2ph and test
    DEBPKG:fixes/threads-tmps-crash - [perl #70411] [24855df] Conditionally
compile tmps stack cleanup code
    DEBPKG:patchlevel - List packaged patches
for 5.10.1-17ubuntu1 in patchlevel.h

@INC for perl 5.10.1:

Environment for perl 5.10.1:
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)

    PERL_BADLANG (unset)

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About