On Tue, Aug 03, 2021 at 04:54:56PM +0100, hv@crypt.org wrote: > Nicholas Clark <nick@ccl4.org> wrote: > :I'd like to suggest that we switch the core's hashtable implementation from > :the current array of linked-lists (which is not friendly with CPU caches) to > :"open addressing" hashing (which is). More specifically, to "Robin Hood > :Hashing", and most specifically to the implementation that I wrote for > :MoarVM (and Perl 5) - "A Better Hash". > > How separable is it, would it be possible to add the new hash algorithm > separately from the switch from linked lists? The latter sounds like it > necessarily needs a big-bang approach, but we have prior experience in > supporting alternate hash algorithms. I think there's some confusion here. It doesn't help that "hash" can be used to mean several things, and I guess "hash algorithm" too. 1) the function that maps arbitrary length keys to fixed size integers (hash function, generating a hash value) 2) an associative array (hash table - a data structure) and I might have missed another one We also have "seed", for the private hopefully-random integer which is used to perturb the hash function so that external attackers can't pre-compute the mapping from key to hash value. Probably we should actually call that "salt", as that's the term for one way hashes for passwords. There isn't a new hash function. But currently where we're using 64 bit hash function we truncate the output to 32 bits. I "need" (I think the need is genuine) 64 bit hash values to make the security work, so I changed the code to stop truncating them. weaker) hash bucket order is exposed on iteration (it can't be hidden) stronger) mapping back from bucket order to get the salt requires figuring out 128 bits of secret random state but it's only 96 bits if we stick to 32 bit hash values. (Also, with 32 bit hash values we can't go beyond 2**32 keys in a hash, whereas I wanted to make this thing future proof.) And given that 32 -> 64 bit hash values needs some API "fun and games" it seemed a good time to also move key lengths to size_t (so 64 bit where it matters) The API "fun and games" seemed to work just fine - I could build and test all of CPAN that I tried. So I'm no longer worried about this part. We could actually just do the 64 bit "API" with the existing linked list hash structure, changing the relevant 2 members to 64 bits and size_t. Nicholas ClarkThread Previous | Thread Next