develooper Front page | perl.perl5.porters | Postings from August 2012

[perl #114576] check if %hash 500x times slower than if keys %hash

Thread Previous | Thread Next
From:
James E Keenan via RT
Date:
August 24, 2012 18:52
Subject:
[perl #114576] check if %hash 500x times slower than if keys %hash
Message ID:
rt-3.6.HEAD-11172-1345859533-53.114576-15-0@perl.org
On Fri Aug 24 16:32:12 2012, mgrigoriev@nvidia.com wrote:
> 
> This is a bug report for perl from mgrigoriev@smtp.nvidia.com,
> generated with the help of perlbug 1.39 running under perl 5.14.1.
> 
> 
> -----------------------------------------------------------------
> [Please describe your issue here]
> 
> if you run this benchmark:
> 
> 
> use Benchmark qw( cmpthese :hireswallclock );
> 
>  my $hash = { 1 .. 10000 };
>  my %hash = ( 1 .. 10000 );
> 
>  cmpthese( -10, {
>      'if (keys %$hash)' => sub { if( keys %$hash ) { return 1 }},
>          'if (keys $hash)'  => sub { if( keys $hash  )  {return 1 } },
> 	     'if (keys %hash)'  => sub { if( keys %hash  ) { return 1 }},
> 	         'if ( %$hash )'    => sub { if( %$hash  )  {return 1 }},
> 		     'if ( %hash )'     => sub { if( %hash  ) { return 1 }},
> 		         'if ( scalar %$hash )'   => sub { if( scalar  %$hash  )
>    {return 1 }},
> 			     'if ( scalar  %hash )'  => sub { if( scalar %hash  ) { return
>    1 }},
> 			     });
> then you will get this result:
> Rate if ( %hash ) if ( scalar %$hash ) if ( %$hash ) if ( scalar
>    %hash ) if (keys $hash) if (keys %$hash) if (keys %hash)
> 
> if ( %hash )           136924/s           --                  -0%
>    -0%                  -0%            -99%             -99%
>    -99%
> if ( scalar %$hash )   136940/s           0%                   --
>    -0%                  -0%            -99%             -99%
>    -99%
> if ( %$hash )          137054/s           0%                   0%
>    --                  -0%            -99%             -99%
>    -99%
> if ( scalar  %hash )   137187/s           0%                   0%
>    0%                   --            -99%             -99%
>    -99%
> if (keys $hash)      14306992/s       10349%               10348%
>    10339%               10329%              --             -14%
>    -28%
> if (keys %$hash)     16698271/s       12095%               12094%
>    12084%               12072%             17%               --
>    -16%
> if (keys %hash)      19933977/s       14458%               14457%
>    14445%               14431%             39%              19%
>    -
> -------------
> as you can see if %hash performs  more than hundred times worse than
>    if keys %hash. It will perform even worse on large hashes, it was
> observed up to 500x worse performance. Seems like this operation is
>    O(n) insteadof O(1).
> Should be fixed ASAP.
> 
> 

AFAICT, all this shows is that correct Perl runs faster than incorrect Perl.

By placing the various versions of $hash and %hash inside the expression
of an 'if' block, you are asking for what's between the parentheses to
be evaluated in scalar context.  But only the variants with 'keys' are
doing what one would expect.

Take this simple program:

#########
$ cat hash.pl
my $hashref = { 1 .. 10000 };
my %hash    = ( 1 .. 10000 );

print "Finished\n";
#########

Now run it through the Perl debugger, using the 'x' debug command to
evaluate expressions and the 'scalar' function to impose a scalar
context similar to that of 'if (EXPR)' (some output trimmed for
readability):

#########
perl -d hash.pl

main::(hash.pl:1):      my $hashref = { 1 .. 10000 };
                                                                       
DB<1> c 4
main::(hash.pl:4):      print "Finished\n";
                                                                       
DB<2> x (scalar keys %hash)
0  5000
                                                                       
DB<3> x (scalar keys %$hashref)
0  5000
                                                                       
DB<4> x (scalar %hash)
0  '3700/8192'
                                                                       
DB<5> x (scalar %$hashref)
0  '3700/8192'
##########
Only the first two variants DWIM.

What's happening here?   Camel book, 4th ed., page 85 states:

##########
"To find the number of keys in a hash, use the keys function in scalar
context."
##########

That's what we're doing in the first two variants.  However, just before
that quotation, Camel says:

##########
When you evaluate a hash variable in scalar context, it returns a true
value only if the hash contains any key/value pairs whatsoever.  If
there are any key/value pairs at all, the value returned is a string
consisting of the number of used buckets and the number of allocated
buckets, separated by a slash.  This is pretty much only useful to find
out whether Perl's (compiled in) hashing algorithm is performing poorly
on your data set.
##########

'perldoc perldata' contains much of the same argument.

Now, I myself can't say why 'if (%hash)' is so much slower than 'if
(keys %hash)'.  But I can say that the former is doing something very
different from the latter.  Hence there is no basis for comparing the
execution speeds of the two expressions.  And there is nothing to be fixed.

Unless others want to provide more detail, I recommend that this RT be
rejected as a "wontfix".

Thank you very much.
Jim Keenan

---
via perlbug:  queue: perl5 status: new
https://rt.perl.org:443/rt3/Ticket/Display.html?id=114576

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About