Front page | perl.perl5.porters |
Postings from December 2016
Re: Hash change... Hybrid OAATH variant for short keys, Siphash 1-3for longer keys
Thread Previous
|
Thread Next
From:
Dave Mitchell
Date:
December 7, 2016 12:41
Subject:
Re: Hash change... Hybrid OAATH variant for short keys, Siphash 1-3for longer keys
Message ID:
20161207124134.GR23564@iabyn.com
On Wed, Dec 07, 2016 at 01:30:58PM +0100, demerphq wrote:
> On 7 December 2016 at 12:21, Dave Mitchell <davem@iabyn.com> wrote:
> > On Wed, Dec 07, 2016 at 12:30:35AM +0100, demerphq wrote:
> >> I just pushed a patch which optimises our hash function somewhat by
> >> using a hybrid approach, with an optimised OAATH for short keys (16
> >> bytes or less), and Siphash 1-3, for the rest.
> >>
> >> For longer keys, say 50 byte or up, Siphash 1-3 makes a big
> >> difference. By the time you get to 1k keys or longer it makes a HUGE
> >> difference.
> >>
> >> For shorter keys I unrolled the OAATH logic so it doesn't have so much
> >> overhead. The new OAATH code is a touch different in that it hashes
> >> back to front.
> >
> > I can't comment on the technical aspects of hash choice, but thanks.
> >
> >> Here are the timings from the change using Porting/bench.pl for keys
> >> of length 1..24, 50, 100, and 1000. I believe most keys in a process
> >> are short, hence the bias in the benchmarks.
> >
> > Just one thing - the tests in t/perf/benchmarks are currently intended to
> > be lightweight, e.g. a test might measure the cost of a single assignment
> > or hash lookup. Your tests are individually processing a thousand keys,
> > which makes them completely dominate the results when averaged using
> > --raw. Also, there is an existing test namespace, expr::hash:: which
> > should be reused. I'd also suggest using exists, and this does a hash
> > generation and lookup, but without the costs of creating or freeing hash
> > elements. Putting that all together, I'd suggest applying something
> > like the attached.
>
> Sure, no problem. I was wondering if that might end up attaching the
> hash value to the key, so the speed of the hash function wont be
> properly tested. Do we run that code in a loop?
The code is run a loop, so what's left over at the end of one iteration
can indeed affect the next. Using exists ensures that neither key nor
value is added to the hash, and is instead only looked up.
--
print+qq&$}$"$/$s$,$a$d$g$s$@$.$q$,$:$.$q$^$,$@$a$~$;$.$q$m&if+map{m,^\d{0\,},,${$::{$'}}=chr($"+=$&||1)}q&10m22,42}6:17a2~2.3@3;^2dg3q/s"&=~m*\d\*.*g
Thread Previous
|
Thread Next