Front page | perl.perl6.internals |
Postings from May 2005
Re: wanted: hash stress tests
From: Leopold Toetsch
May 22, 2005 07:56
Re: wanted: hash stress tests
Message ID: 42909DF8.firstname.lastname@example.org
Bob Rogers wrote:
> Below please find an additional test case for t/pmc/hash.t that defines
>>50K keys, while checking that earlier entries are still present. This
> takes about 0.8 sec on my 1.8GHz AMD box, tunable by increasing I22. Is
> this the sort of thing you had in mind?
Yeah. Thanks. It's OTOH failing here, I've too look at it, then it'll go
in. Another one which additionally deletes keys would be great.
> One question: Why is there a space to store computed hash codes in
> struct parrot_string_t but not in HashBucket?
Having it in the bucket is a space penalty for other objects that can
provide an hash value easily like e.g. Integer PMCs. More complicated
objects like strings can therefore cache their hash val inside their
> ... Part of the reason I am
> asking is because I have started thinking about how to implement Common
> Lisp hashes, for which (a) keys can be arbitrary objects, (b) object
> equivalence is not defined in terms of stringification, and (c) hash
> computation is not necessarily cheap.
It's like in Python. The Py PMCs already implement a hash vtable -
albeit a bit buggy one for double or complex.
> ... The standard technique is
> therefore to store each key's computed hash in the bucket, which seems
> like it would be a win for Parrot as well.
We can do that too, but that's more or less an optimization thingy which
needs some existing RL code to benchmark.
> I don't see the difference WRT shareable,
> It sounds like Uri meant that it could be mapped at different addresses
> for different processes . . . but maybe not, because then the key and
> value pointers would be useless.
A shared PMC exists just once there is no address mapping. Different
processes needs of course distinct PMC data, we can't share data, just code.
> This may be too obvious to be worth pointing out, but if hash internals
> are made persistent, then the hash code computation needs to be
> invariant from one interpreter instantiation to another. For this
> reason, the "seed" member of struct _hash is probably counterproductive.
Or the seed is frozen too.
> Not to mention the fact that initializing it randomly would make it
> harder to test hash iteration, since the key enumeration order would
> then be nondeterministic.
Well, the reason *is* exactly to make it nondeterministic. The perl5
archives should have a lot of discussion WRT that, keyword DOS attacks.
> FWIW, Common Lisp defines an SXHASH function  that must exhibit
> just this invariance characteristic, and must also compute a definite
> answer for circular structures. Time to revive the "hash" opcode?
It depends on how we define the equality of objects or aggregates, i.e
if or how deeply that might recurse. If yes then the hash *vtable* needs
a change and cooporate with the visit vtable like freeze or thaw. But
that would need a change for is_equal too.
>  http://www.lispworks.com/documentation/HyperSpec/Body/f_sxhash.htm