develooper Front page | perl.perl5.porters | Postings from October 2012

Re: Eliminating the "rehash" mechanism for 5.18

Thread Previous | Thread Next
October 30, 2012 01:15
Re: Eliminating the "rehash" mechanism for 5.18
Message ID:
On 30 October 2012 03:59, Ruslan Zakirov <> wrote:
> Hi,
> I agree with proposal to randomize the primary hash seed at the beginning.
> I agree that ReHASH is sub-optimal and brings too much penalty in cases
> when hash access optimized with precomputed hash value.

Yeah, agreed.

> However, read below on security considerations.
> On Tue, Oct 30, 2012 at 3:23 AM, Tony Cook <> wrote:
>> On Mon, Oct 29, 2012 at 06:55:21PM +0100, demerphq wrote:
>>> The attack that this is intended to defuse is that where an attacker
>>> constructs a series of string which are known to collide and then
>>> manages to feed them to a Perl process which uses them to construct a
>>> hash. All further access to the hash then becomes a search against an
>>> un-ordered linked list. This attack is more than a theoretical
>>> possibility because of  the fact that Perl has by default used a
>>> compile time determined constant and poorly chosen hash seed for its
>>> hash algorithm. Meaning that one can use a perl on one box to attack a
>>> perl on another box.
>> (/me puts his paranoid hat on)
>> For an attack on a long lived process, if the process ever returns the
>> contents of a hash in hashed order, could this be used to determine
>> the process hash seed?
>> If so, it could be used to perform the attack we're trying to prevent.
> I think that nobody tried to analyze perl's hash function for reversibility
> by feeding data and guessing hash seed from results.

No, I guess not. But based on what I see with per process hash seed
randomization I am doubtful that such an attack would be possible
outside of laboratory conditions.

> With rehash it becomes much harder because you need to find a set of
> keys that produce minimal number of buckets for two different random
> hash seeds.

No, that isnt correct. With the existing rehash mechanism I would say
it is as hard as attacking a random hash seed. I say this because
pretty well every perl will use that same function and seed for its
base hash. So the problem is not figuring out a set of keys which will
collide in two random hashes, but figuring out a set of keys which
will collide in one known hash and one random hash. And since the
known hash seed is not very good (and arguably the worst choice for a
seed for the algorithm used) the task of finding collisions in the
base hash is trivially easy. Once you find a set of colliding keys
that make the hash go into rehash mode you can build a second set of
keys independently that attacks the second hash seed. It doesn't
matter if the chain that triggered the rehash gets remapped well
distributed, they are only needed to force the hash to use the second
seed. You would then attack the random seed.

Even if we used a random hash seed and the rehash logic IMO it would
still only be a linear increase in attack time, not some kind of
combinatorial effect.

So I think that the current rehash mechanism is about as secure as the
random hash seed proposal.

> If we think about penalty then it's possible to use different hash function
> for rehashed hashes that still uses precomputed hash value. Simplest
> way is just reverse bits for rehashed rather than recompute. Better solution
> is to randomly extend key instead of changing hash seed, so core hash
> function is the same, seed is the same, but for rehashed hashes we add
> random (pregenerated) N bytes to the key when we calculate hash value.
> May be it's possible to store such random sequence in the hash itself, then
> I think it will be impossible to attack such rehash algorithm.

Any "per hash" logic is going to be problematic at the hash
function/seed level. One minor issue with per hash logic is that
traditionally the macros used for calculating the key have not taken
an HV as an argument. A much more important reason is that there is an
internal "master hash" which contains every key in use by perl at any
given time. All other keys are actually pointers to the key data in
the master hash. Store/update operations do operations in this hash as
well as in the actual hash being used.  For obvious reasons we do not
want to recalculate the hash key twice if we can possibly avoid it,
which means that every hash with shared keys has to produce the same
hash value. This means that any process we use to "per hash" the value
needs to be an operation on the common shared value. Some
possibilities are to rotate the bits in the hash value, or multiply by
a per hash constant, or whatever prior to looking up the bucket.

Personally I dont think its worth the effort of doing much more than
thinking about this until someone demonstrates at least a laboratory
grade attack. IMO in a real world environment with multi-host web
sites, web servers being restarted, load balancers, and etc, that
simple hash randomization is sufficient.  Seems like any attack would
require large numbers of fetch/response cycles and in general would
just not be effective in a real production environment. I would assume
that the administrators would notice the weird request pattern before
an attacker could discover enough information to cause damage. Same
argument for non-web services IMO.


perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About