develooper Front page | perl.perl5.porters | Postings from May 2013

Re: What does PERL_HASH_SEED do?

Thread Previous | Thread Next
From:
demerphq
Date:
May 6, 2013 14:42
Subject:
Re: What does PERL_HASH_SEED do?
Message ID:
CANgJU+V=ABVJK6uh=q+H9oKNbwoYdioiQnFoMHGJ2e-oko2_fA@mail.gmail.com
On 4 May 2013 12:50, demerphq <demerphq@gmail.com> wrote:
>> My bottom line is that there needs to some mechanism (be it PERL_HASH_SEED
>> or something else, but preferably PERL_HASH_SEED, since that's a known
>> quantity) that allows you to repeatedly run a fixed piece of code and get
>> the same results; i.e. something that causes the ordering of hv_clear()
>> and keys/values/each to be consistent across runs.
>>
>> I'm happy to do the work if you think it's an achievable goal.
>
> Well, its my mess really, so I should clean it up. (I'd be happy for
> you to review the patches once I write them of course!)

Ok, I did some work on this and have a working patch. But it turns out
that there are more options than I realized.

First a summary of the current obfuscation principles at work in the
hash algorithm:

1. Controlling the hash seed - this changes which hash_value a given
string is mapped to, and thus to which bucket it is store in.
(1a. Hash function choice - a user might build perl with a different
hash function than the default - but irrelevant in this thread)
2. Bucket chain insertion decision - decides if items being inserted
into an occupied bucket are inserted to top, or second to top.
3. Hash traversal xor mask - this determines the order the buckets
will be visited during keys(), values(), each().

Item 1 is what is affected by setting PERL_HASH_SEED.

Item 2 and 3 are controlled by a single global variable
PL_hash_rand_bits, which is updated in various scenarios, such as the
creation of a new HvAUX struct, insertion into a hash, or an hsplit()
operation to change the size of the bucket array. It is updated by
mixing in both deterministic values like the hash_value of the item
being inserted, and by mixing in non-deterministic values like the
pointer value of a HE * or the pointer value of the HvARRAY.

This means we have the possibility of providing two sets of behavior.
We can emulate legacy behavior by preventing PL_hash_rand_bits from
being updated at all. And we can provide a deterministic version of
the current behavior by preventing PL_hash_rand_bits from being
updated in a non deterministic way.

I have just pushed a patch, slightly under polished (IE, no tests,
possibly other issues) to smoke-me/yves_perturb_keys. While I will of
course try to get this finished off myself given time constraints I
would have no objections whatsoever to someone finishing up for me.
Just send me a note so I dont duplicate the effort.

For ease of reading the commit message is as follows:

commit 199efdb0e1e94c3140669c256f0bf9c9412e9828
Author: Yves Orton <demerphq@gmail.com>
Date:   Sun May 5 16:33:43 2013 +0200

    Make it possible to disable and control hash key traversal randomization

    Adds support for PERL_PERTURB_KEYS environment variable, which in
turn allows one to control
    the level of randomization applied to keys() and friends.

    When PERL_PERTURB_KEYS is 0 we will not randomize key order at all. The
    chance that keys() changes due to an insert will be the same as in
    previous perls, basically only when the bucket size is changed.

    When PERL_PERTURB_KEYS is 1 we will randomize keys in a non repeatedable
    way. The chance that keys() changes due to an insert will be very high.
    This is the most secure and default mode.

    When PERL_PERTURB_KEYS is 2 we will randomize keys in a repeatedable way.
    Repititive runs of the same program should produce the same output every
    time. The chance that keys changes due to an insert will be very high.

    This patch also makes PERL_HASH_SEED imply a non-default
    PERL_PERTURB_KEYS setting. Setting PERL_HASH_SEED=0 (exactly one 0) implies
    PERL_PERTURB_KEYS=0 (hash key randomization disabled), settng PERL_HASH_SEED
    to any other value, implies PERL_PERTURB_KEYS=2 (deterministic/repeatable
    hash key randomization). Specifying PERL_PERTURB_KEYS explicitly to a
    different level overrides this behavior.

    Includes changes to allow one to compile out various aspects of the
    patch. One can compile such that PERL_PERTURB_KEYS is not respected, or
    can compile without hash key traversal randomization at all. Note that
    support for these modes is incomplete, and currently a few tests will
    fail.

    Also includes a new subroutine in Hash::Util::hash_traversal_mask()
    which can be used to ensure a given hash produces a predictable key
    order (assuming the same hash seed is in effect). This sub acts as a
    getter and a setter.

    NOTE - this patch lacks tests, but I lack tuits to get them done quickly,
    so I am pushing this with the hope that others can add them afterwards.


Here is an example of the effect.

First, we run a script with PERL_HASH_SEED_DEBUG to find a seed value.
Note setting for PERTURB_KEYS.

$ PERL_HASH_SEED_DEBUG=1 ./perl -le'for ("A".."Z") { $_{$_}=$_; print
0+keys %_,": ", join("", keys %_); }'
HASH_FUNCTION = ONE_AT_A_TIME_HARD HASH_SEED = 0xf5867c55039dc724
PERTURB_KEYS = 1 (RANDOM)
1: A
2: AB
3: BAC
4: CABD
5: DBCAE
6: DFBCAE
7: DGFBCAE
8: DFCGBAEH
9: HAEBIGCFD
10: BAEHIGFCJD
11: CFDJAEHBKIG
12: FCLJDBKAEHIG
13: KMBHAEIGFCLJD
14: IGBKMAEHJDNFCL
15: HAEKMBIGCLFODNJ
16: HMBGCLFODJEAKINP
17: IEAKPNGHBMDJQCLOF
18: IEAKRNPGHMBDQJCLFO
19: PRNKEASIOFCLJQDBMHG
20: MBHGFOTCLQJDKEAISRNP
21: JQDOTFCLGBMUHPRNSIKEA
22: KEASIPRNBMUVHGOTFCLJQD
23: CLFOTDQJHUVMBGWRNPEAKIS
24: PRNSIKXEAJQDOTFCLWGBMUVH
25: DQJCLFOTGYWHUVMBRNPISEAKX
26: ZRNPKXEAISFOTCLQJDMBHUVYWG

Then run program again using the same key. Note the PERTURB_KEYS mode
has changed because of this.

$ PERL_HASH_SEED_DEBUG=1 PERL_HASH_SEED=0xf5867c55039dc724 ./perl
-le'for ("A".."Z") { $_{$_}=$_; print 0+keys %_,": ", join("", keys
%_); }'
HASH_FUNCTION = ONE_AT_A_TIME_HARD HASH_SEED = 0xf5867c55039dc724
PERTURB_KEYS = 2 (DETERMINISTIC)
1: A
2: AB
3: BAC
4: BACD
5: DCAEB
6: DCAEFB
7: BFAECGD
8: AEHBGCFD
9: BHAEIGFCD
10: CFDJHAEBIG
11: DJCFIGAEHBK
12: IGHAEKBDJCLF
13: CLFDJHAEKMBIG
14: FCLJNDBKMAEHIG
15: OFCLJNDBKMAEHIG
16: FOCLJDMBHGNPKEAI
17: QJDFOCLGMBHNPIKEA
18: EAKINRPHMBGCLFODQJ
19: PNRSIEAKDJQCLOFGHBM
20: ISEAKNRPGHMBDQJCLFOT
21: SIEAKPNRGUHBMDJQCLOTF
22: SIEAKPNRGUVHBMDJQCLOTF
23: ISKEANRPWGMBHUVQJDFOTCL
24: XKEAISNRPMBHUVWGFOTCLQJD
25: HUVMBGWYCLFOTDQJEAXKISNRP
26: PNRZSIEAXKDJQCLOTFGWYUVHBM

Run it again, and note it produces the same output:

$ PERL_HASH_SEED_DEBUG=1 PERL_HASH_SEED=0xf5867c55039dc724 ./perl
-le'for ("A".."Z") { $_{$_}=$_; print 0+keys %_,": ", join("", keys
%_); }'
HASH_FUNCTION = ONE_AT_A_TIME_HARD HASH_SEED = 0xf5867c55039dc724
PERTURB_KEYS = 2 (DETERMINISTIC)
1: A
2: AB
3: BAC
4: BACD
5: DCAEB
6: DCAEFB
7: BFAECGD
8: AEHBGCFD
9: BHAEIGFCD
10: CFDJHAEBIG
11: DJCFIGAEHBK
12: IGHAEKBDJCLF
13: CLFDJHAEKMBIG
14: FCLJNDBKMAEHIG
15: OFCLJNDBKMAEHIG
16: FOCLJDMBHGNPKEAI
17: QJDFOCLGMBHNPIKEA
18: EAKINRPHMBGCLFODQJ
19: PNRSIEAKDJQCLOFGHBM
20: ISEAKNRPGHMBDQJCLFOT
21: SIEAKPNRGUHBMDJQCLOTF
22: SIEAKPNRGUVHBMDJQCLOTF
23: ISKEANRPWGMBHUVQJDFOTCL
24: XKEAISNRPMBHUVWGFOTCLQJD
25: HUVMBGWYCLFOTDQJEAXKISNRP
26: PNRZSIEAXKDJQCLOTFGWYUVHBM

Override the PERL_PERTURB_KEYS level and use the previous seed, but
randomize in a non repeatable way:

$ PERL_HASH_SEED_DEBUG=1 PERL_HASH_SEED=0xf5867c55039dc724
PERL_PERTURB_KEYS=1 ./perl -le'for ("A".."Z") { $_{$_}=$_; print
0+keys %_,": ", join("", keys %_); }'
HASH_FUNCTION = ONE_AT_A_TIME_HARD HASH_SEED = 0xf5867c55039dc724
PERTURB_KEYS = 1 (RANDOM)
1: A
2: BA
3: ACB
4: DCAB
5: DBAEC
6: CAEFBD
7: DGFBCAE
8: BHEAGFCD
9: IGEAHBDCF
10: IGEAHBDJCF
11: IGEAHBKDJCF
12: FCLJDKBHEAIG
13: HEAMKBIGCLFDJ
14: CLFNDJHEAMKBIG
15: BMKEAHIGOFCLJND
16: FOCLJDMBHGNPKEAI
17: KEAIPNBMHGOFCLJQD
18: BMHGOFCLJQDKEAIPNR
19: CLFODQJHMBGNRPEAKIS
20: PNREAKSICLTOFDJQHBMG
21: NRPISEAKDQJCLFTOGHUMB
22: CLFTODQJHVUMBGNRPEAKIS
23: KEAISNRPMBHVUWGFTOCLQJD
24: WGMBHVUQJDFTOCLISKXEANRP
25: WYGMBHVUQJDFTOCLISKXEANRP
26: SIKXEAPNRZWYGBMVUHJQDTOFCL

Do it again, note the output is different this time:

$ PERL_HASH_SEED_DEBUG=1 PERL_HASH_SEED=0xf5867c55039dc724
PERL_PERTURB_KEYS=1 ./perl -le'for ("A".."Z") { $_{$_}=$_; print
0+keys %_,": ", join("", keys %_); }'
HASH_FUNCTION = ONE_AT_A_TIME_HARD HASH_SEED = 0xf5867c55039dc724
PERTURB_KEYS = 1 (RANDOM)
1: A
2: BA
3: BCA
4: DBAC
5: DEACB
6: DEACBF
7: BFEACDG
8: GEAHBDCF
9: HEABIGCFD
10: BHEAIGFCJD
11: JDFCIGBKEAH
12: BKEAHIGFCLJD
13: KMBHEAIGFCLJD
14: IGBKMEAHJDNFCL
15: HEAKMBIGCLFODNJ
16: GBMHJDOFCLIKAEPN
17: CLOFDJQHBMGPNAEKI
18: GHMBDQJCLFOIAEKRNP
19: PRNKAESIOFCLJQDBMHG
22: RNPKAEISFOTCLQJDMBHUVG
23: RNPISKAEQJDFOTCLWGMBHUV
24: SIKXAEPRNWGBMUVHJQDOTFCL
25: DJQCLOTFGYWUVHBMPRNSIAEKX
26: GYWUVHBMDJQCLOTFSIAEKXPRNZ

Override the PERL_PERTURB_KEYS level and use the previous seed, and
disable the hash key traversal randomization completely. Note the
sequence does not change line to line except at powers of two.

$ PERL_HASH_SEED_DEBUG=1 PERL_HASH_SEED=0xf5867c55039dc724
PERL_PERTURB_KEYS=0 ./perl -le'for ("A".."Z") { $_{$_}=$_; print
0+keys %_,": ", join("", keys %_); }'
HASH_FUNCTION = ONE_AT_A_TIME_HARD HASH_SEED = 0xf5867c55039dc724
PERTURB_KEYS = 0 (NO)
1: A
2: AB
3: CAB
4: DCAB
5: DCEAB
6: DCEAFB
7: GDCEAFB
8: GHAEBDCF
9: IGHAEBDCF
10: IGHAEBDJCF
11: IGHAEKBDJCF
12: IGHAEKBDJLCF
13: IGHAEMKBDJLCF
14: IGHAEMKBNDJLCF
15: IGHAEMKBNDJLCFO
16: IEAKNPGHMBDJLCFO
17: IEAKNPGHMBDQJLCFO
18: IEAKRNPGHMBDQJLCFO
19: ISEAKRNPGHMBDQJLCFO
20: ISEAKRNPGHMBDQJLCFTO
21: ISEAKRNPGHUMBDQJLCFTO
22: ISEAKRNPGHVUMBDQJLCFTO
23: ISEAKRNPGWHVUMBDQJLCFTO
24: ISEAXKRNPGWHVUMBDQJLCFTO
25: ISEAXKRNPGYWHVUMBDQJLCFTO
26: ISEAXKRNPZGYWHVUMBDQJLCFTO

Do it again and note it does the same thing as before:

$ PERL_HASH_SEED_DEBUG=1 PERL_HASH_SEED=0xf5867c55039dc724
PERL_PERTURB_KEYS=0 ./perl -le'for ("A".."Z") { $_{$_}=$_; print
0+keys %_,": ", join("", keys %_); }'
HASH_FUNCTION = ONE_AT_A_TIME_HARD HASH_SEED = 0xf5867c55039dc724
PERTURB_KEYS = 0 (NO)
1: A
2: AB
3: CAB
4: DCAB
5: DCEAB
6: DCEAFB
7: GDCEAFB
8: GHAEBDCF
9: IGHAEBDCF
10: IGHAEBDJCF
11: IGHAEKBDJCF
12: IGHAEKBDJLCF
13: IGHAEMKBDJLCF
14: IGHAEMKBNDJLCF
15: IGHAEMKBNDJLCFO
16: IEAKNPGHMBDJLCFO
17: IEAKNPGHMBDQJLCFO
18: IEAKRNPGHMBDQJLCFO
19: ISEAKRNPGHMBDQJLCFO
20: ISEAKRNPGHMBDQJLCFTO
21: ISEAKRNPGHUMBDQJLCFTO
22: ISEAKRNPGHVUMBDQJLCFTO
23: ISEAKRNPGWHVUMBDQJLCFTO
24: ISEAXKRNPGWHVUMBDQJLCFTO
25: ISEAXKRNPGYWHVUMBDQJLCFTO
26: ISEAXKRNPZGYWHVUMBDQJLCFTO

Demonstrate the PERL_HASH_SEED=0 is the same as saying
PERL_HASH_SEED=0 PERL_PERTURB_KEYS=0:

$ PERL_HASH_SEED_DEBUG=1 PERL_HASH_SEED=0 ./perl -le'for ("A".."Z") {
$_{$_}=$_; print 0+keys %_,": ", join("", keys %_); }'
HASH_FUNCTION = ONE_AT_A_TIME_HARD HASH_SEED = 0x0000000000000000
PERTURB_KEYS = 0 (NO)
1: A
2: AB
3: ABC
4: ABDC
5: AEBDC
6: AEBDFC
7: GAEBDFC
8: ABEHDFCG
9: ABEHIDFCG
10: ABEHIDJFCG
11: ABEHKIDJFCG
12: ABEHKIDJFCGL
13: ABEHKIDJFCGML
14: NABEHKIDJFCGML
15: ONABEHKIDJFCGML
16: AFCGLONBEHKIDPJM
17: AQFCGLONBEHKIDPJM
18: AQFCGLONBEHRKIDPJM
19: AQFCGLONBEHRKIDPJMS
20: AQFCGLONBEHRKTIDPJMS
21: AQFCGLONBEHRKTIDUPJMS
22: AQFCGLVONBEHRKTIDUPJMS
23: AWQFCGLVONBEHRKTIDUPJMS
24: AWQFCGLVONBEHRKTIDUPJMSX
25: AWQFCGLVONYBEHRKTIDUPJMSX
26: AWQFCGLZVONYBEHRKTIDUPJMSX


--
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About