Front page | perl.perl5.porters | Postings from October 2003

## 5.8.2-RC1 and mp2

From:
Scott A Crosby
Date:
October 30, 2003 06:29
Subject:
5.8.2-RC1 and mp2
Message ID:
oyd1xsuu7d8.fsf@bert.cs.rice.edu
```Hello. I've been monitoring this thread and I thought all this stuff
about rehashing was only for the 5.8.x series and only to preserve
binary compatibility. Thus, you'd revert back to 5.8.1 style
randomized hashes for 5.9.0?

Also, in [1]

(ignoring mod perl) I'm not convinced that it kicks in too way
early.  MJD calculates that the average linked list size on an HV
should be 1.6 entries. (ie there usually shouldn't be a linear
search) It's kicking in at 4, *after* a hash split failed to
partition the longest list below 4. So it's saving linear
searches of 4 elements.

You must check for under-attack on each hash insert. Only checking
when the hash table doubles means opens up the attack I described in
the thread 'Perl 5.8.1, plan C, and algorithmic complexity attacks.'

Also, in a private reply on that thread, MJD said:

If the hash has N buckets and K keys, then the probability that
any given bucket contains exactly i keys is

(N choose i) (N-1)^(K-i) / (N^K)

I posted this on p5p last week sometime.

This formula implies that in a sufficiently large hash, when K=N,
the average nonempty bucket contains 1.6 keys.  'Sufficiently
large' in the context of Perl is 'always'.

I was not able to calculate the closed form for the variance.
However, numeric calculation of the cases that will come up in
practice (say, up to N = 2^30) suggest that 5 * log_2 (n) is
extremely conservative.  It looks to me like it's more like

5 + log(N)/2

and even that is conservative.

Going by his advice, we should rehash in 5.8.2 when the chain length
is > 5+log(N)/2. This is assuming that there are \$N\$ buckets and at
most \$N\$ keys inserted. (IE, a load of 1.)

Scott

[1] http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-10/msg01472.html

```