develooper Front page | perl.perl5.porters | Postings from November 2012

Re: Eliminating the "rehash" mechanism for 5.18

Thread Previous | Thread Next
From:
demerphq
Date:
November 1, 2012 00:45
Subject:
Re: Eliminating the "rehash" mechanism for 5.18
Message ID:
CANgJU+UUMYXrZTWB9WFw7+r4wojNqNo22pq5OX9mWzEGiBkd+Q@mail.gmail.com
On 1 November 2012 08:23, bulk 88 <bulk88@hotmail.com> wrote:
>
>
>
> ----------------------------------------
>> Date: Mon, 29 Oct 2012 18:55:21 +0100
>> Subject: Eliminating the "rehash" mechanism for 5.18
>> From: demerphq@gmail.com
>> To: perl5-porters@perl.org
>> CC: nick@ccl4.org; rjbs@manxome.org
>>
>> Perl currently has a defense against attacks on its hash algorithm
>> where it conditionally enables the use of a random hash seed when it
>> calculates the hash value for a string. This mode is triggered by
>> encountering an overly long bucket chain during insert. Once this bit
>> is set all further hash operations have to go through additional
>> layers of logic. All of this support has a per-operation cost.
> ..................
>> The rehash mechanism more or less works around the problem of hash
>> order expectations by only enabling the randomization in degenerate
>> hashes. But it achieves this at the cost of added per operation
>> overhead in hot code paths and additional complexity in the code. I
>> believe that this was done at the time because of time constraint
>> issues for the 5.8.0 release and a lack of time to warn the community
>> about it in advance.
> ....................
>>
>> Besides the advantages of avoiding the costs of the rehash mechanism
>> this change will smoke out any hash order dependency bugs in the
>> modules on CPAN, and in our own code. (I found at least two unrelated
>> bugs in code in core due to hash randomization.) It will also prepare
>> the ground for people to choose alternative hash algorithms for Perl
>> to use, or to make it possible to use specialized hash algorithms in
>> certain contexts such as 64 bit builds.
>>
>> The downside however will be that it almost certainly will make lots
>> of lazily written code to fail. But IMO the cost is worth it.
>>
>> What do people think?
>
> What will happen to XS code that uses the PERL_HASH macro?

It will be fine. Probably better off as there will be no REHASH
malarky for them to deal with.  XS code can now calculate the hash key
themselves and be confident that that is the only hash key relevant.

> I disagree there is a measurable cost to the current implementation of
> REHASH. If the rehash code is so rare, maybe it should be removed from
> hv_common and placed in its own func, but looking at the asm, the
> overhead is so tiny I dont think the rehash code even matters compared
> to the rest of the design of hv_common. I don't see any performance
> gains by removing it.

Yes, I am unable to show any actual performance gains either.

I still believe the rehash mechanism should be removed to reduce
complexity in the code and to prevent code from ossifying around a
given hash order and to ensure that edge case bugs are tickled.

Changing to a different hash algorithm will break as many tests as
hash randomization and as I have posted elsewhere there IS a clear
performance win in doing so. The future may bring further better hash
functions so I would like to see hash randomization go in to future
proof ourselves against those changes.

> In the past I've done benchmarking on pre
> calculating the hash number to give to hv_common_key_len. I couldn't
> get a consistent benchmark difference between with and without
> the hash number when calling hv_common_key_len in a loop. Having
> the PV in the HEK and the char * given hv_common_key_len be
> exactly the same gave a measurable improvement for me in the same
> hv_common_key_len in a loop benchmark situation.

Can you explain a bit more the latter point?

> For the asm code,
>  masked_flags (stack var reused) was earlier set to HvREHASH(hv)
> as an optimization by the compiler. The whole overhead in the
> hash algorithm macro to support rehash is the following. Just
> a t/f conditional between 2 32 bit read. ebx contains my_perl.

Thanks for the detailed analysis. It is very interesting.

Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About