develooper Front page | perl.perl5.porters | Postings from August 2021

Re: Robin Hood Hashing for the perl core

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
August 6, 2021 08:07
Subject:
Re: Robin Hood Hashing for the perl core
Message ID:
20210806080546.GP11066@etla.org
On Fri, Aug 06, 2021 at 12:41:56AM -0700, Darren Duncan wrote:
> On 2021-08-06 12:20 a.m., Martijn Lievaart wrote:
> > 
> > Op 06-08-2021 om 03:13 schreef Darren Duncan:
> > > On 2021-08-04 1:01 a.m., Smylers wrote:
> > > > Let's debate a specific proposal on its own merits, and not discard it,
> > > > or derail the discussion, in the hope of something bigger and vaguer
> > > > that possibly could happen in the future.
> > > 
> > > I fully agree.
> > > 
> > > As the saying goes, the perfect is the enemy of the good.

As I wrote in my orginal message:

    We can in turn replace this with something even better once code for that
    exists. But until then, don't let the perfect be the enemy of the good.

> > > The current discussion should just focus on whether the current
> > > proposed new solution is significantly better than what Perl has now
> > > and thus whether Perl switching to it would be an improvement.
> > > 
> > > Consideration of other algorithms can be their own separate
> > > discussions if or when someone comes up with a specific example and
> > > makes a case it is better than the winner of the current discussion,
> > > including that someone has already committed to do the work etc.

Yes, exactly that.

> > Yes, but a question that may be asked is, can this be generalized to
> > support different algorithms easily, so programmers can choose the
> > hashing algorithm at runtime. And if it can, is it worth to take such a
> > broadening of the scope on at the same time as handling the current
> > proposal. If it is easy to implement, it should be looked at.
> > 
> > I suspect the answer to this is a firm no, but it never hurts to look at
> > this. Anyone who knows the internals who can say something about that?

1) I am not confident that it's easy to implement
2) *I* am not in a position to attempt it
3) I'm not in a position to mentor someone else who might attempt it
4) I doubt I'd even be in a position to review the code if someone did
   attempt it.
   (I estimate that it even if someone else started now, it would take them
    a couple of months at least, and I by then I won't have as much free time
    as I currently have)

I'm not stopping anyone doing this, but based on previous experience of
how often large good quality surprise contributions arrive, I doubt that
it is going to happen.

> I would say that at this stage the best argument for having that feature of
> hashing algorithm choice is:
> 
> 1. If it turns out from measuring that some common use cases perform way
> better with the current algorithm and others perform way better with the new
> one, and users would then benefit from being able to choose to tune their
> scenarios.
> 
> 2. Also implementing the ability to choose isn't too much effort or has too
> poor trade-offs of is own.
> 
> 3. If going forward the added maintenance cost of two algorithms rather than
> one is worth it.

That's a good summary.

We have finite programmer time.

(eg - I think all but two messages in this thread so far have been from
people who can consider designs and trade offs, but can't review the C code)

If we want to maintain two (or three) effectively we'd be stealing from
one to feed the other(s).

Much like the previous PSC opinion on sort, I'm going to extrapolate and
suggest that it's only better to have 2+, if it's clear that a significant
amount of the user base are actually going to actively *need* to use
different hashing algorithms.

But if all we really end up with is an "optional feature" that rarely
anyone uses (because it costs them more to learn about it and evaluate it
than they gain by choosing it), then we've wasted effort that could have
been spent better elsewhere.


So if someone else steps up an implements another hash, great.

If someone else steps up and figures out how to switch between 2 (or more),
great.

And most of what they need is the approach taken in this branch (cleaned up)
anyway, to get to the point where it was possible to switch at all.


But don't wait for this. Don't assume that it's going to happen.

But until then we should consider what we know is possible, rather than
distract ourselves with talk about what someone else might do.


Nicholas Clark

PS I doubt I'll reply to all the relevant messages in this thread (or others)
   today.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About