develooper Front page | perl.perl5.porters | Postings from October 2021

Re: [External] Re: non-shared hash keys and large hashes (was Re:Robin Hood Hashing for the perl core)

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
October 22, 2021 09:54
Subject:
Re: [External] Re: non-shared hash keys and large hashes (was Re:Robin Hood Hashing for the perl core)
Message ID:
YXKKSmRMz2Y34EOX@etla.org

On Wed, Oct 20, 2021 at 02:59:06PM +0100, hv@crypt.org wrote:

> TLDR: yes please, my test case was 6.25% faster, and used 20% less memory.

> Overall hash-nwc was 1/16 quicker on this work (6.25%), which is an
> impressive speedup - it does quite a bit of non-hash work as well.

And not a fork in sight (if I have it right)

In that this speedup is not the same use case as Yves was thinking would get
a meaningful speedup. This is just one process that uses less RAM and hence
(presumably) fewer cache misses etc.

> I also notice that this commit:
> 
>   Use each HEK's own flags to decide "shared or not", instead of the HV's
> 
> introduces a warning:
> 
>   hv.c: In function 'S_hv_free_ent_ret':
>   hv.c:1767:29: warning: unused parameter 'hv' [-Wunused-parameter]
>    S_hv_free_ent_ret(pTHX_ HV *hv, HE *entry)
> 
> .. by removing the last use of 'hv'.

Thanks. I didn't notice this because I was doing debugging builds, and they
cause macros such as PERL_ARGS_ASSERT_HV_FREE_ENT_RET to be non-empty, and
here that was asserting that hv wasn't NULL, so it "was" used.

I fixed that commit in the rebase so as not to warn. And then kept pulling
at the loose ends, which became this:

commit f892bc2ea53230c3397936db20b9e658950f924e
Author: Nicholas Clark <nick@ccl4.org>
Date:   Thu Oct 21 18:53:01 2021 +0000

    Drop the unused hv argument from S_hv_free_ent_ret()

    In turn, this means that the hv argument to Perl_hv_free_ent() and
    Perl_hv_delayfree_ent() is now clearly unused, to mark it as such. Both
    functions are deemed to be API, so unlike the static function
    S_hv_free_ent_ret we can't simply change their parameters.

    However, change all the internal callers to pass NULL instead of the hv, as
    this makes it obvious that the function does not read hv, and might cause
    the compiler to generate better code.


On Wed, Oct 20, 2021 at 09:46:18AM +0200, Yves Orton via perl5-porters wrote:
> On Tue, Oct 19, 2021 at 9:30 PM Nicholas Clark <nick@ccl4.org> wrote:

> Yeah, my bad. I wrote that quickly in between meetings by memory, and tbh I
> havent used fork directly in ages. We have a wrapper around it that we use
> here at Booking.com that makes it much less easy to blow your foot off. I
> should release it, it makes writing multi forking code trivial.

Yes, seems that would useful.

Also, fixing code you wrote from memory was much faster than me starting
from scratch.

Both your example and Hugo's example seem to demonstrate that I'm not very
good at translating other people's rough descriptions of "my code does this
hot thing, so write a benchmark that times that" into correct code that
really does the hot thing such that the benchmark shows changes.

> That is a huge win. More than 25% it would seem.

I thought it looked that good, but was rather surprised that it was that
obvious. I'm not trusting my measurements here. :-)


"No plan survives contact with the enemy", but the plan is that we're going
away for a bit, so it is possible that I'm "Away From Keyboard" for the next
week (or only intermittently near it). So I've not tried doing the other
things that you've suggested.

However based on running your benchmark and Hugo's different benchmark, I
figured that we want this change, and maybe it's now more about figuring
out whether the heuristic is correct. So I made a PR:

https://github.com/Perl/perl5/pull/19208

> I'll see what I can do.

I'm curious what some of your work-scale code (and data) makes of it. But I
realise that might not be easy to test.

> BTW, replying from my work account primarily because the mail never arrived
> at my normal account.

gmail hates me. Sometimes it thinks that I'm a spammer. Sometimes not.
Sometimes *for the same message* sent to the list - person A gets it,
person B does not. (This last one is particularly confusing)

I'm not sure what more I can do to tell the faceless gmail that I am
genuine long pig, and not some canned pink "meat" thing.

Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About