develooper Front page | perl.perl5.porters | Postings from October 2021

Re: [External] Re: non-shared hash keys and large hashes (was Re:Robin Hood Hashing for the perl core)

Thread Previous | Thread Next
Yves Orton via perl5-porters
October 22, 2021 11:58
Re: [External] Re: non-shared hash keys and large hashes (was Re:Robin Hood Hashing for the perl core)
Message ID:
On Fri, Oct 22, 2021 at 11:54 AM Nicholas Clark <> wrote:

> On Wed, Oct 20, 2021 at 02:59:06PM +0100, wrote:
> > TLDR: yes please, my test case was 6.25% faster, and used 20% less
> memory.
> > Overall hash-nwc was 1/16 quicker on this work (6.25%), which is an
> > impressive speedup - it does quite a bit of non-hash work as well.
> And not a fork in sight (if I have it right)
> In that this speedup is not the same use case as Yves was thinking would
> get
> a meaningful speedup. This is just one process that uses less RAM and hence
> (presumably) fewer cache misses etc.

Yeah, removing the extra store/refcount juggling should be noticeable. I
have a problem similar to Hv's that I am waiting on being able to test. Ill
report back when I can.

> > I also notice that this commit:
> >
> >   Use each HEK's own flags to decide "shared or not", instead of the HV's
> >
> > introduces a warning:
> >
> >   hv.c: In function 'S_hv_free_ent_ret':
> >   hv.c:1767:29: warning: unused parameter 'hv' [-Wunused-parameter]
> >    S_hv_free_ent_ret(pTHX_ HV *hv, HE *entry)
> >
> > .. by removing the last use of 'hv'.
> Thanks. I didn't notice this because I was doing debugging builds, and they
> cause macros such as PERL_ARGS_ASSERT_HV_FREE_ENT_RET to be non-empty, and
> here that was asserting that hv wasn't NULL, so it "was" used.
> I fixed that commit in the rebase so as not to warn. And then kept pulling
> at the loose ends, which became this:
> commit f892bc2ea53230c3397936db20b9e658950f924e
> Author: Nicholas Clark <>
> Date:   Thu Oct 21 18:53:01 2021 +0000
>     Drop the unused hv argument from S_hv_free_ent_ret()
>     In turn, this means that the hv argument to Perl_hv_free_ent() and
>     Perl_hv_delayfree_ent() is now clearly unused, to mark it as such. Both
>     functions are deemed to be API, so unlike the static function
>     S_hv_free_ent_ret we can't simply change their parameters.
>     However, change all the internal callers to pass NULL instead of the
> hv, as
>     this makes it obvious that the function does not read hv, and might
> cause
>     the compiler to generate better code.
> On Wed, Oct 20, 2021 at 09:46:18AM +0200, Yves Orton via perl5-porters
> wrote:
> > On Tue, Oct 19, 2021 at 9:30 PM Nicholas Clark <> wrote:
> > Yeah, my bad. I wrote that quickly in between meetings by memory, and
> tbh I
> > havent used fork directly in ages. We have a wrapper around it that we
> use
> > here at that makes it much less easy to blow your foot off. I
> > should release it, it makes writing multi forking code trivial.
> Yes, seems that would useful.
Ack. Willdo.

> Also, fixing code you wrote from memory was much faster than me starting
> from scratch.
> Both your example and Hugo's example seem to demonstrate that I'm not very
> good at translating other people's rough descriptions of "my code does this
> hot thing, so write a benchmark that times that" into correct code that
> really does the hot thing such that the benchmark shows changes.

Too close to the forest probably. Not to worry. You did the important bit.
We can work up the benchmarks.

> > That is a huge win. More than 25% it would seem.
> I thought it looked that good, but was rather surprised that it was that
> obvious. I'm not trusting my measurements here. :-)

It isnt inconceivable to me. We saw some really interesting effects using
perl to aggregate really large cardinality data sets, your data IMO
corroborates my suspicions it was due to the PL_strtab. I am kinda kicking
myself I didnt release a newHV_shared() in one of the Hash::Util modules to
test this previously. Too many things to do.  Note the latter would
actually allow us to benchmark the unshared aspect without your core
changes. Its a trivial patch too.

I still think it would be really helpful to have a pragma to turn off
shared hash creation in scope. Something like:

  no shared 'hashes';
  my %hash;
  for my (@$tuples) {
      my ($x,$y,$z)=@$tuple;

IN such a case having a "switch to unshared when keysize exceeds X" would
save a lot of conversions from shared to unshared.

Alternatively, we could switch the sharing on at bless time, and only
blessed hashes would be shared.

> "No plan survives contact with the enemy", but the plan is that we're going
> away for a bit, so it is possible that I'm "Away From Keyboard" for the
> next
> week (or only intermittently near it). So I've not tried doing the other
> things that you've suggested.
> However based on running your benchmark and Hugo's different benchmark, I
> figured that we want this change, and maybe it's now more about figuring
> out whether the heuristic is correct. So I made a PR:
> > I'll see what I can do.
> I'm curious what some of your work-scale code (and data) makes of it. But I
> realise that might not be easy to test.
Its more about getting code built. I have asked our resident build expert
to help but he was/is off.

> > BTW, replying from my work account primarily because the mail never
> arrived
> > at my normal account.
> gmail hates me. Sometimes it thinks that I'm a spammer. Sometimes not.
> Sometimes *for the same message* sent to the list - person A gets it,
> person B does not. (This last one is particularly confusing)
> I'm not sure what more I can do to tell the faceless gmail that I am
> genuine long pig, and not some canned pink "meat" thing.
Nod, well i didnt see you in the spam folder either, but ill check again in
the future.


Yves Orton
Principal Developer & Fellow
[image:] <>
Making it easier for everyone
to experience the world.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About