develooper Front page | perl.perl5.porters | Postings from October 2012

Re: Eliminating the "rehash" mechanism for 5.18

Thread Previous | Thread Next
From:
demerphq
Date:
October 29, 2012 11:11
Subject:
Re: Eliminating the "rehash" mechanism for 5.18
Message ID:
CANgJU+Wu8UvRe_LwkFMuXVEMmEhcJw18od5RwP=rJVfnhB6+Sw@mail.gmail.com
On 29 October 2012 19:03, Jesse Luehrs <doy@tozt.net> wrote:
> On Mon, Oct 29, 2012 at 06:55:21PM +0100, demerphq wrote:
>> Perl currently has a defense against attacks on its hash algorithm
>> where it conditionally enables the use of a random hash seed when it
>> calculates the hash value for a string. This mode is triggered by
>> encountering an overly long bucket chain during insert. Once this bit
>> is set all further hash operations have to go through additional
>> layers of logic. All of this support has a per-operation cost.
>>
>> The attack that this is intended to defuse is that where an attacker
>> constructs a series of string which are known to collide and then
>> manages to feed them to a Perl process which uses them to construct a
>> hash. All further access to the hash then becomes a search against an
>> un-ordered linked list. This attack is more than a theoretical
>> possibility because of  the fact that Perl has by default used a
>> compile time determined constant and poorly chosen hash seed for its
>> hash algorithm. Meaning that one can use a perl on one box to attack a
>> perl on another box.
>>
>> The solution to this problem is simple, merely choose a random hash
>> seed at process start up as it effectively prevents the attacker from
>> constructing their colliding keys remotely.  However it also means
>> that the order in which a set of keys are stored in a hash becomes
>> random. On the face of it one would think this is not an issue. After
>> all the order in which keys are returned from a hash has been
>> documented as undefined for a long time. However if the perl core
>> tests and the bundled modules it comes with are anything to go by then
>> it turns out that while people are not necessarily expecting the order
>> to be well defined they do seem to expect that the order is consistent
>> and repeatable between invocations and perl versions and etc.
>>
>> The rehash mechanism more or less works around the problem of hash
>> order expectations by only enabling the randomization in degenerate
>> hashes. But it achieves this at the cost of added per operation
>> overhead in hot code paths and additional complexity in the code.  I
>> believe that this was done at the time because of time constraint
>> issues for the 5.8.0 release and a lack of time to warn the community
>> about it in advance.
>>
>> I propose that in 5.18 we should finally take the plunge and switch to
>> a fully randomized hash seed and eliminate the rehash mechanism.  I
>> have patches which do this ready, and for  the cpan/ modules involved
>> I have pushed patches upstream to get the hash order dependencies
>> removed.
>>
>> Besides the advantages of avoiding the costs of the rehash mechanism
>> this change will smoke out any hash order dependency bugs in the
>> modules on CPAN, and in our own code. (I found at least two unrelated
>> bugs in code in core due to hash randomization.) It will also prepare
>> the ground for people to choose alternative hash algorithms for Perl
>> to use, or to make it possible to use specialized hash algorithms in
>> certain contexts such as 64 bit builds.
>>
>> The downside however will be that it almost certainly will make lots
>> of lazily written code to fail. But IMO the cost is worth it.
>>
>> What do people think?
>
> We have had to deal with this issue for quite some time in Moose - minor
> changes to internal implementation details can change the order that
> things are inserted into hashes, causing hash iteration order to change,
> and this ends up looking like a Moose bug when in reality it's because
> people are relying on hash iteration order in general staying
> consistent. If hash iteration order is always random on a per-invocation
> level, these bugs will never be able to be introduced in the first
> place, which would save a lot of pain when refactoring things.

Indeed. I encountered at least one bug related to insert order
dependency as a side effect of a hash order dependency which came down
to something like this:

my %hash = (foo => 1, bar => 2, baz =>3);
my %copy = %hash;

is_deeply([keys %hash],[keys %copy]);

which would pass on old perls because all three keys ended up in
different buckets. Using hash randomization or a different hash
algorithm which resulted in more than one of these keys being mapped
to the same bucket would break the test. Of course so would adding a
colliding key to the test on the old perl...

cheers,
Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About