develooper Front page | perl.perl5.porters | Postings from August 2021

Re: Robin Hood Hashing for the perl core

Thread Previous | Thread Next
Nicholas Clark
August 18, 2021 14:37
Re: Robin Hood Hashing for the perl core
Message ID:
On Wed, Aug 18, 2021 at 01:09:47PM +0000, Ed Avis wrote:
> Could I mention one more topic related to hashing, which might turn out to be related to the proposed new implementation.  When a reference is used as a hash key, I believe that currently it gets stringified to something like ARRAY(0xabc) and this string is then hashed.  (Correct me if I'm wrong.)

Yes, but if it's blessed then the class name is part of that:

$ perl -lwe 'my $obj = bless {}; print $obj'

and if has overloaded stringification, it can do whatever it wants.

Which becomes relevant...

> For the programmer this is a bit awkward, since you can't get back the original objects by iterating over the keys.  You have to use a wrapper like Tie::RefHash, which is slower, and awkward for multidimensional data structures.  (There is Tie::RefHash::Nestable for hashes of hashes, but even that doesn't cover a hash of arrays of hashes, and so on.)
> But I think it must also be slower.  Surely if you have the reference you can treat it as an address (a 64-bit or 32-bit integer) and have a fast way to transform that to a position in the hash data structure.  Even if the "address hashing" function ends up being fairly complex, it still has to be faster than converting the address to ASCII text and then hashing the string.  Storing the string will also use more memory than storing a pointer plus a tag to say what type of object.
> So I am wondering whether, if the hash internals get reworked for greater speed, the treatment of references can also change.  Then using references as a hash key would be faster.  Moreover, it could be fast to get back the original references rather than a stringified version -- perhaps with a new opcode or new builtin that is like 'keys' but doesn't stringify.  Tie::RefHash could change to call that, for compatibility with existing programs, while new programs could use the builtin directly and be even quicker (no tying overhead).
> I have asked before whether references could always be preserved in hash keys -- so "Tie::RefHash all of the time".  There are some corner cases where this would change the semantics, such as adding to a hash both a string and a reference which stringifies to the same thing, even if you ignore the question of code that explicitly checks ref() on each hash key returned.  So for now I'm not advocating this as a global change.  However, I think if Perl could provide a fast builtin way to use references as hash keys and get them back unharmed, it would be very useful for new code.  Is that something that can be fitted into the new hashing implementation?

Even with the current hash implementation, I think that this would be simple
enough for the extremely constrained corner case of

* exclusively references
* "hashed"/equality determined by address

(and either with the current hash or ABH, it needs more C code, and more
if statements or function pointers or something else equivalent...)

because one would

1) just store the pointer to the SV
2) do the numeric hashing using the pointer address, and comparison using
   pointer address

and that "hashing"/"comparison" step is *not* using the same approach as
Perl currently uses for (string) comparison between references.

You can't (then) also store regular strings as keys in the same hash.

Unless you then change to using the current stringification of references
as your "key" for hashing. (even if you calculate it as needed, or cache it)
as otherwise (as you observe) effectively you could store duplicate keys
(the true reference, and its stringification)

But then you hit another problem:

$ perl -lwe 'my $obj = bless {}; print $obj; bless $obj, "other"; print $obj'

"I" can change the string that a reference hashes to, after I insert it into
the hash.

So is that disallowed? If so how?

And this also can't support references that want to create objects that
act as "value" types. So these two are different:

$ perl -MMath::BigInt -MScalar::Util=refaddr -lwe 'my $x = Math::BigInt->new(42); my $y = Math::BigInt->new(42); print $x == $y ? "==" : "!="; print refaddr($x) == refaddr($y) ? "Same" : "different"'

but that means that

* either one has to use exactly the same object to look up in the hash
* or one has to then implement some form of overloaded comparison

the first makes it extremely inflexible and counter intuitive
the second is not going to be massively performant

I don't see a winning move here.

Nicholas Clark

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About