develooper Front page | perl.perl5.porters | Postings from September 2021

Re: Robin Hood Hashing for the perl core

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
September 10, 2021 09:27
Subject:
Re: Robin Hood Hashing for the perl core
Message ID:
YTsk7PLpxy2eJ8j1@etla.org
On Tue, Aug 03, 2021 at 04:54:56PM +0100, hv@crypt.org wrote:
> Nicholas Clark <nick@ccl4.org> wrote:
> :I'd like to suggest that we switch the core's hashtable implementation from
> :the current array of linked-lists (which is not friendly with CPU caches) to
> :"open addressing" hashing (which is). More specifically, to "Robin Hood
> :Hashing", and most specifically to the implementation that I wrote for
> :MoarVM (and Perl 5) - "A Better Hash".
> 
> How separable is it, would it be possible to add the new hash algorithm
> separately from the switch from linked lists? The latter sounds like it
> necessarily needs a big-bang approach, but we have prior experience in
> supporting alternate hash algorithms.

I think there's some confusion here. It doesn't help that "hash" can be
used to mean several things, and I guess "hash algorithm" too.

1) the function that maps arbitrary length keys to fixed size integers
   (hash function, generating a hash value)
2) an associative array
   (hash table - a data structure)

and I might have missed another one

We also have "seed", for the private hopefully-random integer which is used
to perturb the hash function so that external attackers can't pre-compute
the mapping from key to hash value.

Probably we should actually call that "salt", as that's the term for one
way hashes for passwords.


There isn't a new hash function. But currently where we're using 64 bit
hash function we truncate the output to 32 bits.

I "need" (I think the need is genuine) 64 bit hash values to make the
security work, so I changed the code to stop truncating them.

weaker)   hash bucket order is exposed on iteration (it can't be hidden)
stronger) mapping back from bucket order to get the salt requires figuring
          out 128 bits of secret random state


but it's only 96 bits if we stick to 32 bit hash values.

(Also, with 32 bit hash values we can't go beyond 2**32 keys in a hash,
whereas I wanted to make this thing future proof.)


And given that 32 -> 64 bit hash values needs some API "fun and games" it
seemed a good time to also move key lengths to size_t (so 64 bit where it
matters)

The API "fun and games" seemed to work just fine - I could build and test
all of CPAN that I tried. So I'm no longer worried about this part.

We could actually just do the 64 bit "API" with the existing linked list
hash structure, changing the relevant 2 members to 64 bits and size_t.

Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About