Front page | perl.perl5.porters |
Postings from May 2003
hv.c patch - pathological hashes too easy
Thread Next
From:
Tye McQueen
Date:
May 1, 2003 11:28
Subject:
hv.c patch - pathological hashes too easy
Message ID:
200305011829.h41ITJC07814@metronet.com
Porters,
I patched this "bug" in hv.c a long time ago but that patch was never
applied. I'm trying again.
Here is the proof of the "bug":
#!/usr/bin/perl
use strict;
my %test;
@test{1..6}= ();
my %bug;
$|++;
for my $key ( 'aaa'..'zzz' ) {
$test{$key}++;
if( 6 == %test ) {
$bug{$key}++;
print $key, ' '.%bug.' ', 0+keys %bug, "\r";
}
delete $test{$key};
}
print $/;
for( 1..11 ) {
$bug{0}++;
delete $bug{0};
print "$_: ".%bug.' ', 0+keys %bug, $/;
}
__END__
Output:
zzz 6/8 13182
1: 12/16 13182
2: 24/32 13182
3: 48/64 13182
4: 96/128 13182
5: 192/256 13182
6: 384/512 13182
7: 768/1024 13182
8: 1536/2048 13182
9: 3072/4096 13182
10: 6132/8192 13182
11: 10217/16384 13182
So we have put 75% of the keys from 'aaa'..'zzz' into %bug, a total of
13,182 keys, and yet we never bothered to grow the number of buckets in
%bug. So each bucket contains a linked list of around 2200 items (which
makes the hash quite slow, nearly defeating the purpose of a hash). I
can put 87.5% of all keys into such a patholical hash, but the code to
demonstrate that is more complex (or almost 94% of all keys for a 16-
bucket hash, etc.).
If we then insert a few keys in a different way, we eventually repair
%bug such that is has 10217 occupied buckets, none of which has more
than 2 items in it (I checked that but don't include that code here
for brevity).
This also means (once the patch is applied) that to get the same length
of chains would require that I search a space of about 6,000,000 keys
in order to find less than 0.0004% of them to insert.
The fix is quite simple:
diff -r -U 3 perl-5.8.0/hv.c perl-tye/hv.c
--- perl-5.8.0/hv.c Thu May 01 10:39:38 2003
+++ perl-tye/hv.c Thu May 01 10:44:06 2003
@@ -658,8 +658,8 @@
xhv->xhv_keys++; /* HvKEYS(hv)++ */
if (i) { /* initial entry? */
xhv->xhv_fill++; /* HvFILL(hv)++ */
- if (xhv->xhv_keys > (IV)xhv->xhv_max /* HvKEYS(hv) > HvMAX(hv) */)
- hsplit(hv);
+ } else if (xhv->xhv_keys > (IV)xhv->xhv_max /* HvKEYS(hv) > HvMAX(hv) */)
+ hsplit(hv);
}
return &HeVAL(entry);
(end)
We are checking for the need to split the hash at the wrong time. My
previous patch just did the check every time. However, doing the
check only when a collision happens perhaps saves a couple of cycles
(which might have been part of why the previous patch was ignored)
while not allowing such patological hashes to be constructed.
Thanks,
Tye
Thread Next
-
hv.c patch - pathological hashes too easy
by Tye McQueen