develooper Front page | perl.perl5.porters | Postings from October 2021

Re: non-shared hash keys and large hashes (was Re: Robin HoodHashing for the perl core)

Thread Previous | Thread Next
October 20, 2021 14:50
Re: non-shared hash keys and large hashes (was Re: Robin HoodHashing for the perl core)
Message ID:
Nicholas Clark <> wrote:
:Sort of like this?
:It turns off shared keys once hashes get (roughly) larger than 42 keys.
:(Except for objects and symbol tables)

TLDR: yes please, my test case was 6.25% faster, and used 20% less memory.

It seems to make a big difference for me on a recent real (well, pure)
maths case I worked on (;
the bulk of the work happening is in the 'check' function at:

- fetch blead @ad41dd2f09, install under /opt/hash-blead
- rebase above nwc branch on that blead, install under /opt/hash-nwc
- in both case configure with:
  ./Configure -des -Dcc=gcc -Dprefix=$PREFIX -Doptimize='-g -O6' \
      -Dusedevel -Uversiononly
- create 4.7GB input with `./try -v -q 15 >vq15`, representing approx
  50M sets of points that (under symmetries) reduce to 10M distinct sets
- simultaneously run classification under each of those perls to find
  those duplicates:

% /usr/bin/time /opt/hash-blead/bin/perl ./classify -u -q <vq15
3594.48user 1.38system 59:55.77elapsed 100%CPU (0avgtext+0avgdata 2378648maxresident)k
7016inputs+0outputs (23major+593602minor)pagefaults 0swaps

% /usr/bin/time /opt/hash-nwc/bin/perl ./classify -u -q <vq15
3369.20user 1.99system 56:11.16elapsed 100%CPU (0avgtext+0avgdata 1905472maxresident)k
9330096inputs+0outputs (25major+475349minor)pagefaults 0swaps

Overall hash-nwc was 1/16 quicker on this work (6.25%), which is an
impressive speedup - it does quite a bit of non-hash work as well.

I observed in top(1) that around the halfway point, hash-blead was
using 10% more memory than hash-nwc (1.1GB v. 1.0GB), but the final
figure was 25% more (2.26GB v. 1.81GB). That seemed odd to me, I had
expected the ratio to stay pretty constant.

The 7016 v. 9330096 "inputs" look screwy - this is "Number of file system
inputs by the process", which should be identical between the two runs.

The 25% more pagefaults under blead sensibly matches the memory figures.

I also notice that this commit:

  Use each HEK's own flags to decide "shared or not", instead of the HV's

introduces a warning:

  hv.c: In function 'S_hv_free_ent_ret':
  hv.c:1767:29: warning: unused parameter 'hv' [-Wunused-parameter]
   S_hv_free_ent_ret(pTHX_ HV *hv, HE *entry)

.. by removing the last use of 'hv'.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About