On 29 October 2012 19:03, Jesse Luehrs <doy@tozt.net> wrote: > On Mon, Oct 29, 2012 at 06:55:21PM +0100, demerphq wrote: >> Perl currently has a defense against attacks on its hash algorithm >> where it conditionally enables the use of a random hash seed when it >> calculates the hash value for a string. This mode is triggered by >> encountering an overly long bucket chain during insert. Once this bit >> is set all further hash operations have to go through additional >> layers of logic. All of this support has a per-operation cost. >> >> The attack that this is intended to defuse is that where an attacker >> constructs a series of string which are known to collide and then >> manages to feed them to a Perl process which uses them to construct a >> hash. All further access to the hash then becomes a search against an >> un-ordered linked list. This attack is more than a theoretical >> possibility because of the fact that Perl has by default used a >> compile time determined constant and poorly chosen hash seed for its >> hash algorithm. Meaning that one can use a perl on one box to attack a >> perl on another box. >> >> The solution to this problem is simple, merely choose a random hash >> seed at process start up as it effectively prevents the attacker from >> constructing their colliding keys remotely. However it also means >> that the order in which a set of keys are stored in a hash becomes >> random. On the face of it one would think this is not an issue. After >> all the order in which keys are returned from a hash has been >> documented as undefined for a long time. However if the perl core >> tests and the bundled modules it comes with are anything to go by then >> it turns out that while people are not necessarily expecting the order >> to be well defined they do seem to expect that the order is consistent >> and repeatable between invocations and perl versions and etc. >> >> The rehash mechanism more or less works around the problem of hash >> order expectations by only enabling the randomization in degenerate >> hashes. But it achieves this at the cost of added per operation >> overhead in hot code paths and additional complexity in the code. I >> believe that this was done at the time because of time constraint >> issues for the 5.8.0 release and a lack of time to warn the community >> about it in advance. >> >> I propose that in 5.18 we should finally take the plunge and switch to >> a fully randomized hash seed and eliminate the rehash mechanism. I >> have patches which do this ready, and for the cpan/ modules involved >> I have pushed patches upstream to get the hash order dependencies >> removed. >> >> Besides the advantages of avoiding the costs of the rehash mechanism >> this change will smoke out any hash order dependency bugs in the >> modules on CPAN, and in our own code. (I found at least two unrelated >> bugs in code in core due to hash randomization.) It will also prepare >> the ground for people to choose alternative hash algorithms for Perl >> to use, or to make it possible to use specialized hash algorithms in >> certain contexts such as 64 bit builds. >> >> The downside however will be that it almost certainly will make lots >> of lazily written code to fail. But IMO the cost is worth it. >> >> What do people think? > > We have had to deal with this issue for quite some time in Moose - minor > changes to internal implementation details can change the order that > things are inserted into hashes, causing hash iteration order to change, > and this ends up looking like a Moose bug when in reality it's because > people are relying on hash iteration order in general staying > consistent. If hash iteration order is always random on a per-invocation > level, these bugs will never be able to be introduced in the first > place, which would save a lot of pain when refactoring things. Indeed. I encountered at least one bug related to insert order dependency as a side effect of a hash order dependency which came down to something like this: my %hash = (foo => 1, bar => 2, baz =>3); my %copy = %hash; is_deeply([keys %hash],[keys %copy]); which would pass on old perls because all three keys ended up in different buckets. Using hash randomization or a different hash algorithm which resulted in more than one of these keys being mapped to the same bucket would break the test. Of course so would adding a colliding key to the test on the old perl... cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next