Front page | perl.perl6.internals |
Postings from May 2005
Re: wanted: hash stress tests
From: Bob Rogers
May 22, 2005 19:46
Re: wanted: hash stress tests
Message ID: email@example.com
From: Leopold Toetsch <firstname.lastname@example.org>
Date: Sun, 22 May 2005 16:58:00 +0200
Bob Rogers wrote:
> Below please find an additional test case for t/pmc/hash.t that defines
>>50K keys, while checking that earlier entries are still present. This
> takes about 0.8 sec on my 1.8GHz AMD box, tunable by increasing I22. Is
> this the sort of thing you had in mind?
Yeah. Thanks. It's OTOH failing here, I've too look at it, then it'll go
in. Another one which additionally deletes keys would be great.
Good idea; I'll work on that. (I assume the failure was because of the
bug I introduced into the emailed version?)
> One question: Why is there a space to store computed hash codes in
> struct parrot_string_t but not in HashBucket?
Having it in the bucket is a space penalty for other objects that can
provide an hash value easily like e.g. Integer PMCs. More complicated
objects like strings can therefore cache their hash val inside their
Hmm. Seems to me that this space penalty is proportional to the number
of keys for keeping it in the HashBucket, and proportional to the number
of objects for keeping it in the objects. Since the number of string
keys (e.g.) must be smaller than the total number of strings, and may be
much smaller, it seems like it would be better to keep them in the
HashBucket, even though that's somewhat redundant for such things as
Integer PMCs. True?
> Not to mention the fact that initializing it randomly would make it
> harder to test hash iteration, since the key enumeration order would
> then be nondeterministic.
Well, the reason *is* exactly to make it nondeterministic. The perl5
archives should have a lot of discussion WRT that, keyword DOS attacks.
If DOS attacks on hashes are a consideration, then using the seed purely
as an initializer won't be enough. In particular, I note that
key_hash_cstring produces the same hash code for the strings "cB" and
"bc" regardless of the seed value, so they would always collide. Doing
h += hash->seed + *p++;
in the iteration would make it harder to attack, IMHO. (But IANASE
("security expert"). ;-)
On the other hand, I would argue that it's way too early to apply
this kind of low-level security hardening; it's a lot of work, and it's
too easy for someone to accidentally undo it in a rapidly-evolving
system by helpfully "simplifying" the code.
In any case, I take your point that the seed will eventually need to
be initialized randomly -- as long as it can be set deterministically
for testing/debugging purposes, I'm happy.
> FWIW, Common Lisp defines an SXHASH function  that must exhibit
> just this invariance characteristic, and must also compute a definite
> answer for circular structures. Time to revive the "hash" opcode?
It depends on how we define the equality of objects or aggregates, i.e
if or how deeply that might recurse . . .
True. That is all set in stone  for Common Lisp, but other languages
may have other requirements. I imagine all languages will need to be
able to compute hashes on keys from other languages, so a universal hash
operator that treats its argument as opaque would be useful. (Since the
GC doesn't move objects, the object address may be sufficient.) But
thinking about which language operator and/or object should decide
whether to recur into a given object for purposes of equality testing
gives me a headache . . .