Front page | perl.perl5.porters |
Postings from October 2003
Re: 5.8.2-RC1 and mp2
From: Scott A Crosby
October 30, 2003 12:08
Re: 5.8.2-RC1 and mp2
Message ID: firstname.lastname@example.org
On Thu, 30 Oct 2003 19:25:17 +0000, Nicholas Clark <email@example.com> writes:
> > You must check for under-attack on each hash insert. Only checking
> > when the hash table doubles means opens up the attack I described in
> > the thread 'Perl 5.8.1, plan C, and algorithmic complexity attacks.'
> Sorry, not clear. It is. The logic is
> For a regular hash:
> Any "store" that is a change to an existing key returns immediately (as before)
> Any store that is the first entry in a bucket returns.
> Any other store (by implication) lengthens an existing chain. At this point:
> a) If a chain becomes longer than the threshold, a split is triggered.
> b) Also the old test (IIRC number of entries > number of buckets) also
> triggers a split.
> The description above applies to what happens in a split. Therefore a
> rehash can be triggered immediately by any insert that lengthens an existing
Enlightenment. Thanks, looks good.
> > Also, in a private reply on that thread, MJD said:
> > If the hash has N buckets and K keys, then the probability that
> > any given bucket contains exactly i keys is
> > (N choose i) (N-1)^(K-i) / (N^K)
> > I posted this on p5p last week sometime.
> > This formula implies that in a sufficiently large hash, when K=N,
> > the average nonempty bucket contains 1.6 keys. 'Sufficiently
> > large' in the context of Perl is 'always'.
> > I was not able to calculate the closed form for the variance.
> > However, numeric calculation of the cases that will come up in
> > practice (say, up to N = 2^30) suggest that 5 * log_2 (n) is
> > extremely conservative. It looks to me like it's more like
> > 5 + log(N)/2
> > and even that is conservative.
> > Going by his advice, we should rehash in 5.8.2 when the chain length
> > is > 5+log(N)/2. This is assuming that there are $N$ buckets and at
> > most $N$ keys inserted. (IE, a load of 1.)
> OK. Patches welcome from anyone who wishes to implement that.
> If whoever can figure out somewhere in the HV to store that number then
> we can cache it, and only recalculate on a split. How much of a speed
> hit is a log on a modern CPU?
No need. For N= 10^9, the constant would be 20. Just use that. Assume
you are under attack (or observing pathological inputs) if any hash
chain has >20 elements.
Sorry, no patch.
It leaves behind a VERY small opening for a very minor attack --- at
worst a slight degradation. Thats a small enough impact that no
attacker would bother with it.