On Mon, Oct 29, 2012 at 06:55:21PM +0100, demerphq wrote: > Perl currently has a defense against attacks on its hash algorithm > where it conditionally enables the use of a random hash seed when it > calculates the hash value for a string. This mode is triggered by > encountering an overly long bucket chain during insert. Once this bit > is set all further hash operations have to go through additional > layers of logic. All of this support has a per-operation cost. > > The attack that this is intended to defuse is that where an attacker > constructs a series of string which are known to collide and then > manages to feed them to a Perl process which uses them to construct a > hash. All further access to the hash then becomes a search against an > un-ordered linked list. This attack is more than a theoretical > possibility because of the fact that Perl has by default used a > compile time determined constant and poorly chosen hash seed for its > hash algorithm. Meaning that one can use a perl on one box to attack a > perl on another box. > > The solution to this problem is simple, merely choose a random hash > seed at process start up as it effectively prevents the attacker from > constructing their colliding keys remotely. However it also means > that the order in which a set of keys are stored in a hash becomes > random. On the face of it one would think this is not an issue. After > all the order in which keys are returned from a hash has been > documented as undefined for a long time. However if the perl core > tests and the bundled modules it comes with are anything to go by then > it turns out that while people are not necessarily expecting the order > to be well defined they do seem to expect that the order is consistent > and repeatable between invocations and perl versions and etc. > > The rehash mechanism more or less works around the problem of hash > order expectations by only enabling the randomization in degenerate > hashes. But it achieves this at the cost of added per operation > overhead in hot code paths and additional complexity in the code. I > believe that this was done at the time because of time constraint > issues for the 5.8.0 release and a lack of time to warn the community > about it in advance. > > I propose that in 5.18 we should finally take the plunge and switch to > a fully randomized hash seed and eliminate the rehash mechanism. I > have patches which do this ready, and for the cpan/ modules involved > I have pushed patches upstream to get the hash order dependencies > removed. > > Besides the advantages of avoiding the costs of the rehash mechanism > this change will smoke out any hash order dependency bugs in the > modules on CPAN, and in our own code. (I found at least two unrelated > bugs in code in core due to hash randomization.) It will also prepare > the ground for people to choose alternative hash algorithms for Perl > to use, or to make it possible to use specialized hash algorithms in > certain contexts such as 64 bit builds. > > The downside however will be that it almost certainly will make lots > of lazily written code to fail. But IMO the cost is worth it. > > What do people think? We have had to deal with this issue for quite some time in Moose - minor changes to internal implementation details can change the order that things are inserted into hashes, causing hash iteration order to change, and this ends up looking like a Moose bug when in reality it's because people are relying on hash iteration order in general staying consistent. If hash iteration order is always random on a per-invocation level, these bugs will never be able to be introduced in the first place, which would save a lot of pain when refactoring things. -doyThread Previous | Thread Next