develooper Front page | perl.perl5.porters | Postings from April 2003

Re: [patch] probable Unicode speed trap solution

Thread Previous | Thread Next
From:
Pradeep Hodigere
Date:
April 14, 2003 17:51
Subject:
Re: [patch] probable Unicode speed trap solution
Message ID:
20030415005131.89031.qmail@web12307.mail.yahoo.com
Thanks Rafael.

  I'll try benchmarking and running some tests on
Jarkko's patch.

  There was this issue on the perlunicode document
that was mentioned in my previous posting. It might
have been lost in the details.

it read...   

  On a side note, I believe the 'speed' section in
the current perlunicode document may not be reflecting
the comparison between split() and substr().

  perldoc perlunicode says: 

  "Even though the algorithm based on substr() is
faster than split() for byte-encoded data, it pales in
comparison to the speed of split() when used with
UTF-8 data. "

  Benchmarking split() and substr() gave me the
following results that indicate substr() is faster
than split().

  Please review the following benchmark results:
  
Since we need to benchmark subtr() alone, i store the
length of the string in advance.

% perl -e '
  use Benchmark;
  use strict;
  our $l = 10000;
  our $b; our $u;
  our $bl; our $ul;
 
  our $b = our $u = "x" x $l;
  substr($u,0, 1) = "\x{100}";
  my $bl = length($b) - 1;
  my $ul = length($u) - 1;
  timethese(-5,{
    SPLIT_B => q{ for my $c (split //, $b){}  },
    SPLIT_U => q{ for my $c (split //, $u){}  },
    SUBSTR_N_B => q{ for my $i (0..$bl){my $c =
substr($b,$i,1);} },
    SUBSTR_N_U => q{ for my $i (0..$ul){my $c =
substr($u,$i,1);} },
  });'
 
following are the results:
Benchmark: running SPLIT_B, SPLIT_U, SUBSTR_N_B,
SUBSTR_N_U, each for at least 5 CPU seconds...
   SPLIT_B:  6 wallclock secs ( 5.30 usr +  0.02 sys =
 5.32 CPU) @ 51.13/s (n=272)
   SPLIT_U:  5 wallclock secs ( 5.37 usr +  0.00 sys =
 5.37 CPU) @ 50.28/s (n=270)
SUBSTR_N_B:  5 wallclock secs ( 5.13 usr +  0.00 sys =
 5.13 CPU) @ 250791.62/s (n=1286561)
SUBSTR_N_U:  4 wallclock secs ( 5.31 usr +  0.00 sys =
 5.31 CPU) @ 10028.63/s (n=53252)

  as against the following script that's in
perlunicode man page:

%perl -e '
  use Benchmark;
  use strict;
  our $l = 10000;
  our $b; our $u;
  our $bl; our $ul;
 
  our $b = our $u = "x" x $l;
  substr($u,0, 1) = "\x{100}";
  timethese(-5,{
  SPLIT_B => q{ for my $c (split //, $b){}  },
  SPLIT_U => q{ for my $c (split //, $u){}  },
  SUBSTR_L_B => q{ for my $i (0..length($b)-1){my $c =
substr($b,$i,1);} },
  SUBSTR_L_U => q{ for my $i (0..length($u)-1){my $c =
substr($u,$i,1);} },
});'

  whose results are:
Benchmark: running SPLIT_B, SPLIT_U, SUBSTR_L_B,
SUBSTR_L_U, each for at least 5 CPU seconds...
   SPLIT_B:  5 wallclock secs ( 5.35 usr +  0.00 sys =
 5.35 CPU) @ 50.84/s (n=272)
   SPLIT_U:  5 wallclock secs ( 5.36 usr +  0.01 sys =
 5.37 CPU) @ 50.65/s (n=272)
SUBSTR_L_B:  6 wallclock secs ( 5.34 usr +  0.00 sys =
 5.34 CPU) @ 98.31/s (n=525)
SUBSTR_L_U:  6 wallclock secs ( 5.74 usr +  0.00 sys =
 5.74 CPU) @  0.70/s (n=4)


  The first set of results is faster than the second
as length() is called while benchmarking substr() in
the second test.

  So the actual comparison between split() and
substr() is in the first set of results. As is
apparent, substr() is faster than split() in those
tests.

-pradeep



--- Rafael Garcia-Suarez <rgarciasuarez@free.fr>
wrote:
> Pradeep Hodigere wrote:
> >  perldoc perlunicode's speed section mentions the
> > slowness of length(), substr() and index()
> functions
> > when handling UTF-8 encoded strings. A mail thread
> > titled 'Unicode speed trap' discusses probable
> > solutions to this problem.
> > 
> >    I have an implementation that might be a
> solution
> > to this issue and have attached a patch for the
> same.
> > The patch brought about a sizeable performance
> > improvement in length() and substr() functions. 
> 
> Thanks a lot for your work, but it appears that
> Jarkko Hietaniemi has
> already worked on this problem, and has implemented
> a slightly more
> sophisticated solution as change #18353.
> 
> Of course, what would be helpful, if you're
> inclined, is to read
> Jarkko's code and try to find holes in it -- as he
> says, "code this
> hairy is bound to have hairy trolls hiding under
> it". One of the ways
> to achieve this is to write tests, if you feel that
> some part of it
> is not thoroughly tested.
> 
> See for further reference :
>   
>
http://archive.develooper.com/perl5-changes%40perl.org/msg06360.html
>   
> http://use.perl.org/article.pl?sid=03/04/14/1129241
> and the perlhack manpage, if you want to get the
> development version of
> perl.
> 
> -- 
> Unofficial is not *NIX


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About