develooper Front page | perl.perl5.porters | Postings from June 2002

[ID 20020627.001] regex and utf-8 performance problem

From:
Stefan Traby
Date:
June 27, 2002 00:40
Subject:
[ID 20020627.001] regex and utf-8 performance problem
Message ID:
E17NTnG-0002gm-00@stefan

This is a bug report for perl from stefan@hello-penguin.com,
generated with the help of perlbug 1.33 running under perl v5.8.0.


-----------------------------------------------------------------
[Please enter your report here]

Some re's are extremly slow on 5.8.0pre1 when
using utf-8...

----------------------------------------
use Convert::Scalar;
use Benchmark;

sub wrap_text {
          my $x;
          for (split /\n/, $_[0]) {
                s/\G(.{1,$_[1]})(?:\s+|$)/$1\n/gm;
                $x .= $_;
          }
          $x =~ s/[ \t\015]+$//g;
          $x;
}
my ($x,$y);

$x = "hello\t" x30;
$x = "$x\n"x250;

$y = $x;
Convert::Scalar::utf8_upgrade($y);

timethese(300, { "ascii" => sub { wrap_text $x, 80 },
                 "utf-8" => sub { wrap_text $y, 80 }
});

----------------------------------------


Benchmark: timing 300 iterations of ascii, utf-8...
     ascii:  3 wallclock secs ( 2.91 usr +  0.00 sys =  2.91 CPU) @ 103.09/s (n=300)
     utf-8: 26 wallclock secs (26.24 usr +  0.00 sys = 26.24 CPU) @ 11.43/s (n=300)


   
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=low
---
Site configuration information for perl v5.8.0:

Configured by root at Fri Jun 14 22:39:17 MEST 2002.

Summary of my perl5 (revision 5.0 version 8 subversion 0 patch 17247) configuration:
  Platform:
    osname=linux, osvers=2.4.19-pre9, archname=i686-linux
    uname='linux stefan 2.4.19-pre9 #1 sam jun 1 12:43:24 mest 2002 i686 unknown '
    config_args='-Dprefix=/usr/websys -Doptimize=-O3 -march=i686 -momit-leaf-frame-pointer -Duseperlio=true -Dusethreads=undef -Dperladmin=stefan@hello-penguin.com -Dusemultiplicity=undef -Dusedevel -Dusemymalloc=true -Duselargefiles=true -Duseposix=true -Dlocincpth=/usr/websys/include /opt/include /usr/local/include -Dloclibpth=/usr/websys/lib /opt/lib /usr/local/lib -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -I/usr/websys/include -I/opt/include -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O3 -march=i686 -momit-leaf-frame-pointer',
    cppflags='-fno-strict-aliasing -I/usr/websys/include -I/opt/include -I/usr/local/include'
    ccversion='', gccversion='2.95.4 20011002 (Debian prerelease)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/websys/lib -L/opt/lib -L/usr/local/lib'
    libpth=/usr/websys/lib /opt/lib /usr/local/lib /lib /usr/lib
    libs=-lnsl -ldb -ldl -lm -lc -lcrypt -lutil
    perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil
    libc=/lib/libc-2.2.5.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.2.5'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/websys/lib -L/opt/lib -L/usr/local/lib'

Locally applied patches:
    DEVEL17237

---
@INC for perl v5.8.0:
    /usr/websys/lib/perl5/5.8.0/i686-linux
    /usr/websys/lib/perl5/5.8.0
    /usr/websys/lib/perl5/site_perl/5.8.0/i686-linux
    /usr/websys/lib/perl5/site_perl/5.8.0
    /usr/websys/lib/perl5/site_perl
    .

---
Environment for perl v5.8.0:
    HOME=/.localvol000/home/stefan
    LANG (unset)
    LANGUAGE (unset)
    LC_CTYPE=de_DE.ISO-8859-1
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/websys/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/bin:/usr/X11R6/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/usr/bin/X11
    PERL_BADLANG (unset)
    SHELL=/bin/bash




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About