develooper Front page | perl.perl5.porters | Postings from October 2003

Re: 5.8.2-RC1 and mp2

Thread Previous | Thread Next
Scott A Crosby
October 30, 2003 12:08
Re: 5.8.2-RC1 and mp2
Message ID:
On Thu, 30 Oct 2003 19:25:17 +0000, Nicholas Clark <> writes:

> > You must check for under-attack on each hash insert. Only checking
> > when the hash table doubles means opens up the attack I described in
> > the thread 'Perl 5.8.1, plan C, and algorithmic complexity attacks.'
> Sorry, not clear. It is. The logic is
> For a regular hash:
> Any "store" that is a change to an existing key returns immediately (as before)
> Any store that is the first entry in a bucket returns.
> Any other store (by implication) lengthens an existing chain. At this point:
> a) If a chain becomes longer than the threshold, a split is triggered.
> b) Also the old test (IIRC number of entries > number of buckets) also
>    triggers a split.
> The description above applies to what happens in a split. Therefore a
> rehash can be triggered immediately by any insert that lengthens an existing
> chain.

Enlightenment. Thanks, looks good.

> > Also, in a private reply on that thread, MJD said:
> > 
> >     If the hash has N buckets and K keys, then the probability that
> >     any given bucket contains exactly i keys is
> > 
> >           (N choose i) (N-1)^(K-i) / (N^K)
> > 
> >     I posted this on p5p last week sometime.
> > 
> >     This formula implies that in a sufficiently large hash, when K=N,
> >     the average nonempty bucket contains 1.6 keys.  'Sufficiently
> >     large' in the context of Perl is 'always'.
> > 
> >     I was not able to calculate the closed form for the variance.
> >     However, numeric calculation of the cases that will come up in
> >     practice (say, up to N = 2^30) suggest that 5 * log_2 (n) is
> >     extremely conservative.  It looks to me like it's more like
> > 
> >            5 + log(N)/2
> > 
> >     and even that is conservative.
> > 
> > Going by his advice, we should rehash in 5.8.2 when the chain length
> > is > 5+log(N)/2. This is assuming that there are $N$ buckets and at
> > most $N$ keys inserted. (IE, a load of 1.) 
> OK. Patches welcome from anyone who wishes to implement that.
> If whoever can figure out somewhere in the HV to store that number then
> we can cache it, and only recalculate on a split. How much of a speed
> hit is a log on a modern CPU?

No need. For N= 10^9, the constant would be 20. Just use that. Assume
you are under attack (or observing pathological inputs) if any hash
chain has >20 elements.

Sorry, no patch.

It leaves behind a VERY small opening for a very minor attack --- at
worst a slight degradation. Thats a small enough impact that no
attacker would bother with it.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About