develooper Front page | perl.perl5.porters | Postings from August 2012

Slowdown in longer hash keys

Thread Next
From:
Karl Williamson
Date:
August 24, 2012 13:15
Subject:
Slowdown in longer hash keys
Message ID:
5037E0BA.1090509@khwilliamson.com
I was trying to benchmark the effects of moving to a binary search 
instead of a swash hash, as was discussed in the recent thread entitled 
"Seeking guidance on Benchmarking and potentially removing swash caching"

I tried an implementation in which I just added a key to the swash hash 
"USE_INVLIST".  Each time a character was looked up in the hash, it 
would first check if there is an element in the hash with that key, and 
use the swash's inversion list (stored under the key "INVLIST") instead 
of its normal swash behavior.

I was surprised to find that this was twice as slow as without. 
Eventually, I used gprof to pin down the reason why, and got the 
following results.

First, the old way:
   %   cumulative   self              self     total
  time   seconds   seconds    calls   s/call   s/call  name
  32.01     18.39    18.39 200863004     0.00     0.00  Perl_hv_common
  21.72     30.88    12.48 200854008     0.00     0.00  Perl_swash_fetch

Then, the new:
   %   cumulative   self              self     total
  time   seconds   seconds    calls   s/call   s/call  name
  60.69     71.47    71.47 400679084     0.00     0.00  Perl_hv_common
   6.07     78.62     7.15 200854008     0.00     0.00  Perl_swash_fetch

There were about twice as many calls to hv_common, but the time spent in 
it nearly quadrupled.  I eventually tried shortening the keys listed 
above to one byte each, and I got this:
   %   cumulative   self              self     total
  time   seconds   seconds    calls   s/call   s/call  name
  47.40     41.78    41.78 400682134     0.00     0.00  Perl_hv_common
   8.70     49.45     7.67 200854008     0.00     0.00  Perl_swash_fetch

It appears to me that the length of the keys had an outsize impact on 
performance.

(I eventually went to an implementation which avoids using extra hash 
keys, and the performance of the binary search over the hash improved by 
about 37% when the list was 64 elements or less long (using binary 
search for shorter lists, and the hash for longer).  The benchmark was 
on a Korean text using the \X regex match (which is quite complicated 
for Korean) )

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About