develooper Front page | perl.perl5.porters | Postings from August 2021

Re: Robin Hood Hashing for the perl core

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
August 18, 2021 10:16
Subject:
Re: Robin Hood Hashing for the perl core
Message ID:
20210818101548.GW11066@etla.org
On Sat, Aug 07, 2021 at 11:24:56AM +0100, Dave Mitchell wrote:

Sorry for the delay in replying. I've been on holiday. There were loads of
pumpkins there, but I think that they will require a lot of training up
before they become useful to us:

https://gist.github.com/nwc10/37a85e5272bcfc460b77809ec648d518

> On Tue, Aug 03, 2021 at 02:27:42PM +0000, Nicholas Clark wrote:
> > The proposed implementation "open addressing" eliminates the linked lists by
> > storing HE structs in a flat array:
> 
> So in effect HvARRAY() changes from being an array of SV pointers to an
> array of HE structs?

Effectively yes (except that you meant HE struct pointers. It's AvARRAY
that is SV pointers. Well, SV**)

Which isn't the same thing - in as much as I can't see any way to fake up
an interface that offers an array of pointers-to (so can be passed around
as values of type HE **) when all that exists in memory is an array of HE *,
short of

1) allocating memory for an entire array of HE **
2) initialising it correctly
3) "there you go"
4) freeing it ("later")

hence why I eliminated the macro HvARRAY and renamed the entry in the union
that it wrapped. "Break early, break often..." ish.

> How will this affect the hv_fetch_ent() and similar functions, which
> return a pointer to a HE? In particular what happens to such a pointer if
> the HvARRAY() gets reallocated in the meantime. E.g.
> 
>     HE *he = hv_fetch_ent(hv, ...);
>     hv_store(hv,...); /* XXX triggers HvARRAY realloc */
>     do_something_with(he->val); /* using freed memory! */

It will go wrong the way that you figured out.

We could work around *this* problem by deferring the memory free onto the
save stack - the entire hash structure is a single block of memory.

But array reallocation only happens on insert, and insert can move things
within the allocated block.


Which got me thinking more generally...

https://grep.metacpan.org/search?q=hv_fetch_ent.%2A%5C%28

thinks that there are 128 distributions using hv_fetch_ent. That's small
enough to audit (but not small enough to do while writing an e-mail) to
see if any do this.

(and 85 using hv_store_ent, most of which use it in void context)


I'm going to guess (but haven't checked this yet) that few to none do this.
Any that do can be patched. (Like with HvARRAY)

At which point we can say all of

1) We have audited *all* of CPAN for expected problems
2) We are confident that the entire public ecosystem is good
3) Roughly 0.0something% of CPAN distributions were affected
4) If you have your own XS code, you should audit for these things [...]


where we have a good feel for how small 0.0something is.


But really this is the only general guarantee that we can make for any
stable release

1) We checked CPAN
2) We figured out the probability of breakage
3) If it's published, we have found it and fixed it already
4) If it's not published, we can't fix that for you


Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About