develooper Front page | perl.perl5.porters | Postings from November 2016

Re: [perl #130084] [PATCH] Increase hash colision ratio and reducehsplit calls

Thread Previous | Thread Next
From:
demerphq
Date:
November 14, 2016 09:05
Subject:
Re: [perl #130084] [PATCH] Increase hash colision ratio and reducehsplit calls
Message ID:
CANgJU+X5XTzwE49LAWRbdXgHr3kKFAvQPf2azV-EbRuGke3OTw@mail.gmail.com
On 14 November 2016 at 01:58, Atoomic via RT <perlbug-followup@perl.org> wrote:
> Thanks for your time & feedback, the graphical visualization is definitively a plus.
> Indeed it's all a matter of tradeoff/tuning to find the best compromise.
> I would also not consider a load factor higher than 4.
>
> My main concern is if we slow down hashes by reducing the memory usage, this would be a regression from my point of view...
> And from the naive tests I run earlier seems that the load factor as a significant impact on run time.

For sure, but some of your test cases were at very very high load
factors, like 16 or 64.

I think a max load factor of 4 is probably fine, but we can and should
benchmark, using Dave's porting/bench.pl for proper analysis.

And we need a test set that is comprehensive. We basically have two
broad cases, with two flavours each (read/write).

read-hit/update
read-miss/insert

So we need a benchmark that builds a largish hash (IMO something like
1024-4096 range), and which then tests updating keys that are known to
be in the hash, and also tests building the hash, and also tests
reading keys known not to be in the hash. We probably also want to use
a constant key size.

> I was also considering an alternate solution to change the grow factor only at the beginning for small hashes only.
> Which will avoid to waste memory when the hash starts to be big... This will probably fix your concern of wasting memory ?

We need to find the right balance between a size that over-waste with
lots of small hashes (think objects), while does not waste with a few
large hashes.

Lets discuss this in person.

> Maybe this should be an option that could be set at compilation time ? (This would not solve the question if we need/want to change the default value.)

Yes for sure. My version of this patch is already structured along those lines.

> Any opinion with this ? Of course the definition of "small/big" needs to be define, was thinking something like 1024.

As others have said a big part of the problem with this discussion is
that hashes are used for something like three purposes:

1. objects, typically small, many repeated keys
2. reference tables, often large, few repeated keys
3. symbol tables, probably in the middle in terms of both key frequency and size

We have to be careful not to structure this such that we win in one
case, such as reference tables, but lose in the other two cases.

>
> I'm also fine leaving it as it's, but I've the feeling that this default setup can be optimized.

I definitely think we can and benchmarks willing, should tweak some of this.

But I am still reluctant to change the grow factor. If I am correct
that the ideal for our purposes is to have a load factor between 2 and
4, then growing by anything other than doubling will leave us at a
load factor smaller than two after a split, which then wastes space
for little speed benefit.

cheers,
Yves

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About