develooper Front page | perl.perl5.porters | Postings from September 2021

Re: Robin Hood Hashing for the perl core

Thread Previous | Thread Next
Nicholas Clark
September 1, 2021 13:50
Re: Robin Hood Hashing for the perl core
Message ID:
On Mon, Aug 30, 2021 at 01:53:20PM +0100, Dave Mitchell wrote:
> On Wed, Aug 18, 2021 at 10:15:48AM +0000, Nicholas Clark wrote:
> > > How will this affect the hv_fetch_ent() and similar functions, which
> > > return a pointer to a HE? In particular what happens to such a pointer if
> > > the HvARRAY() gets reallocated in the meantime. E.g.
> > > 
> > >     HE *he = hv_fetch_ent(hv, ...);
> > >     hv_store(hv,...); /* XXX triggers HvARRAY realloc */
> > >     do_something_with(he->val); /* using freed memory! */
> > 
> > It will go wrong the way that you figured out.
> Perhaps we should update the docs for hv_*ent() now to state that the
> returned value has a limited lifetime, and mustn't be used after any
> further 'write' to the hash?

Whatever else happens, yes I think that this is a good idea.

But more generally, if you're totally paranoid you can't assume anything
about it if you let the hash out of your sight, as anything you call might
cause the hash entry to be deleted, and hence the HE to be freed.

Your code doesn't own a reference to it.

I doubt that there's anything that evil out there though.

On Mon, Aug 30, 2021 at 01:51:50PM +0100, Dave Mitchell wrote:
> On Wed, Aug 18, 2021 at 01:09:47PM +0000, Ed Avis wrote:
> > I have asked before whether references could always be preserved in hash
> > keys
> Just a general observation here about proposals to change or improve hash
> semantics. I've been meaning for many years (but lacked tuits) to introduce
> hash vtables, which would allow for differing hash implementations to be
> called on a per-hash basis (so you could declare a hash as being of a
> key-order-preserving or ref-as-key-preserving type for example).

Yes. At least, "yes" is what I thought at first.

But I think for this case, it won't help most of the time.

"I don't want hash entries to move underneath me" is a constraint wanted by
the call site (the third party XS code)

"This hash can move about"/"This hash does not" would be a property of the
vtable, and you couldn't "turn it on/off" for any hash passed into the XS

> Implementing this would then allow proposals such as yours to implemented
> as optional extras without risking breaking backwards compatibility or
> internals efficiency.

So I don't think that it can implement this, at least not "risk free"

I'm also not convinced about the efficiency. At the lowest level, it's a call
through a function pointer, which I thought that CPUs didn't totally enjoy.

But one level up, I can see (and in that branch) have made code and structure
assumptions based on how the parts of the hash fit together.

Trying to be general, particularly with iterators, is more complex.

I'm also sort of starting to think that there are two "levels" of hash
we're dealing with (anyway)

1) the low(er) level "dumb" hash that stores things, and can be iterated
   - a data structure
2) SVt_PVHV, which can trigger arbitrary code on actions (such as tie)
   - an object

and the VTABLEs for the two somewhat differ.

For the lower level, I think the minimal efficient API we get away with is

* build
* demolish
* fetch
* LVALUE fetch
* delete returning

and ideally

* pre-grow this hash

store is implemented as LVALUE fetch.
  or LVALUE fetch is emulated with fetch and then maybe store
delete returning would be emulated with fetch and then delete

and then we also want iteration, which I think is:

* build
* demolish
* current key
* current value
* advance
* finished?

but really we'd like to have first class iterator (ie 2+ read-only
can exist for the same hash at the same time)

and that API I've sketched out above

1) doesn't map 1 to 1 with the current tied hash API
2) doesn't consider whether we need to have "bless" at the API level

and the current hash API is nuts with tied hashes.

"you call store, and then if it returns 0 you actually need to call set magic
on your value", IIRC

so it's not obvious to me how "tied hashes" fit into what the core needs to
call down to. Or what shape a sane API should be for the core to expose,
such that you can actually pass tied hashes into unsuspecting XS code and,
well, it actually works.

Nicholas Clark

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About