develooper Front page | perl.perl5.porters | Postings from November 2012

RE: Eliminating the "rehash" mechanism for 5.18

Thread Previous | Thread Next
From:
bulk 88
Date:
November 1, 2012 00:23
Subject:
RE: Eliminating the "rehash" mechanism for 5.18
Message ID:
COL115-W2163A2AD24521EEF259D09DF600@phx.gbl



----------------------------------------
> Date: Mon, 29 Oct 2012 18:55:21 +0100
> Subject: Eliminating the "rehash" mechanism for 5.18
> From: demerphq@gmail.com
> To: perl5-porters@perl.org
> CC: nick@ccl4.org; rjbs@manxome.org
>
> Perl currently has a defense against attacks on its hash algorithm
> where it conditionally enables the use of a random hash seed when it
> calculates the hash value for a string. This mode is triggered by
> encountering an overly long bucket chain during insert. Once this bit
> is set all further hash operations have to go through additional
> layers of logic. All of this support has a per-operation cost.
..................
> The rehash mechanism more or less works around the problem of hash
> order expectations by only enabling the randomization in degenerate
> hashes. But it achieves this at the cost of added per operation
> overhead in hot code paths and additional complexity in the code. I
> believe that this was done at the time because of time constraint
> issues for the 5.8.0 release and a lack of time to warn the community
> about it in advance.
....................
>
> Besides the advantages of avoiding the costs of the rehash mechanism
> this change will smoke out any hash order dependency bugs in the
> modules on CPAN, and in our own code. (I found at least two unrelated
> bugs in code in core due to hash randomization.) It will also prepare
> the ground for people to choose alternative hash algorithms for Perl
> to use, or to make it possible to use specialized hash algorithms in
> certain contexts such as 64 bit builds.
>
> The downside however will be that it almost certainly will make lots
> of lazily written code to fail. But IMO the cost is worth it.
>
> What do people think?

What will happen to XS code that uses the PERL_HASH macro?

I disagree there is a measurable cost to the current implementation of 
REHASH. If the rehash code is so rare, maybe it should be removed from 
hv_common and placed in its own func, but looking at the asm, the 
overhead is so tiny I dont think the rehash code even matters compared 
to the rest of the design of hv_common. I don't see any performance 
gains by removing it. In the past I've done benchmarking on pre 
calculating the hash number to give to hv_common_key_len. I couldn't 
get a consistent benchmark difference between with and without 
the hash number when calling hv_common_key_len in a loop. Having 
the PV in the HEK and the char * given hv_common_key_len be 
exactly the same gave a measurable improvement for me in the same 
hv_common_key_len in a loop benchmark situation. For the asm code,
 masked_flags (stack var reused) was earlier set to HvREHASH(hv) 
as an optimization by the compiler. The whole overhead in the 
hash algorithm macro to support rehash is the following. Just 
a t/f conditional between 2 32 bit read. ebx contains my_perl.

280131F3 83 7D F4 00      cmp         dword ptr [masked_flags],0 

280131F7 8B 4D 14         mov         ecx,dword ptr [key] 

280131FA 74 08            je          28013204 

280131FC 8B 83 B0 09 00 00 mov         eax,dword ptr [ebx+9B0h] 

28013202 EB 06            jmp         2801320A 

28013204 8B 83 AC 09 00 00 mov         eax,dword ptr [ebx+9ACh] 

"U32 hash_PeRlHaSh = (internal ? PL_rehash_seed : PL_hash_seed);"

Full block of asm from hv_common, blead, -01 VC 32 bits.

   616:     if (HvREHASH(hv) || (!hash && !(keysv && (SvIsCOW_shared_hash(keysv)))))
2801318E 8B 47 08         mov         eax,dword ptr [edi+8] 
28013191 25 00 00 00 01   and         eax,1000000h 
28013196 89 45 F4         mov         dword ptr [masked_flags],eax 
28013199 75 58            jne         280131F3 
2801319B 8B 4D 28         mov         ecx,dword ptr [hash] 
2801319E 85 C9            test        ecx,ecx 
280131A0 0F 85 94 00 00 00 jne         2801323A 
280131A6 8B 55 10         mov         edx,dword ptr [keysv] 
280131A9 85 D2            test        edx,edx 
280131AB 74 46            je          280131F3 
280131AD 8B 42 08         mov         eax,dword ptr [edx+8] 
280131B0 B9 00 00 00 09   mov         ecx,9000000h 
280131B5 8B F0            mov         esi,eax 
280131B7 23 F1            and         esi,ecx 
280131B9 3B F1            cmp         esi,ecx 
280131BB 75 36            jne         280131F3 
280131BD 8B C8            mov         ecx,eax 
280131BF 81 E1 00 C0 00 00 and         ecx,0C000h 
280131C5 81 F9 00 80 00 00 cmp         ecx,8000h 
280131CB 75 12            jne         280131DF 
280131CD 8B C8            mov         ecx,eax 
280131CF 81 E1 FF 00 00 00 and         ecx,0FFh 
280131D5 83 F9 09         cmp         ecx,9 
280131D8 74 19            je          280131F3 
280131DA 83 F9 0A         cmp         ecx,0Ah 
280131DD 74 14            je          280131F3 
280131DF 3C 08            cmp         al,8 
280131E1 74 10            je          280131F3 
280131E3 8B 02            mov         eax,dword ptr [edx] 
280131E5 83 78 0C 00      cmp         dword ptr [eax+0Ch],0 
280131E9 75 08            jne         280131F3 
   618:     else if (!hash)
   619:     hash = SvSHARED_HASH(keysv);
280131EB 8B 42 0C         mov         eax,dword ptr [edx+0Ch] 
280131EE 8B 48 F8         mov         ecx,dword ptr [eax-8] 
280131F1 EB 44            jmp         28013237 
   617:     PERL_HASH_INTERNAL_(hash, key, klen, HvREHASH(hv));
280131F3 83 7D F4 00      cmp         dword ptr [masked_flags],0 
280131F7 8B 4D 14         mov         ecx,dword ptr [key] 
280131FA 74 08            je          28013204 
280131FC 8B 83 B0 09 00 00 mov         eax,dword ptr [ebx+9B0h] 
28013202 EB 06            jmp         2801320A 
28013204 8B 83 AC 09 00 00 mov         eax,dword ptr [ebx+9ACh] 
2801320A 8B 55 18         mov         edx,dword ptr [klen] 
2801320D 85 D2            test        edx,edx 
2801320F 74 16            je          28013227 
28013211 0F B6 31         movzx       esi,byte ptr [ecx] 
28013214 03 C6            add         eax,esi 
28013216 69 C0 01 04 00 00 imul        eax,eax,401h 
2801321C 8B F0            mov         esi,eax 
2801321E C1 EE 06         shr         esi,6 
28013221 41               inc         ecx  
28013222 33 C6            xor         eax,esi 
28013224 4A               dec         edx  
28013225 75 EA            jne         28013211 
28013227 6B C0 09         imul        eax,eax,9 
2801322A 8B C8            mov         ecx,eax 
2801322C C1 E9 0B         shr         ecx,0Bh 
2801322F 33 C8            xor         ecx,eax 
28013231 69 C9 01 80 00 00 imul        ecx,ecx,8001h 
28013237 89 4D 28         mov         dword ptr [hash],ecx 
   620: 
   621:     /* We don't have a pointer to the hv, so we have to replicate the
   622:        flag into every HEK, so that hv_iterkeysv can see it.
   623:        And yes, you do need this even though you are not "storing" because
   624:        you can flip the flags below if doing an lval lookup.  (And that
   625:        was put in to give the semantics Andreas was expecting.)  */
   626:     if (HvREHASH(hv))
2801323A 83 7D F4 00      cmp         dword ptr [masked_flags],0 
2801323E 74 04            je          28013244 
   627:     flags |= HVhek_REHASH;
28013240 83 4D 1C 04      or          dword ptr [flags],4 
   628: 
   629:     masked_flags = (flags & HVhek_MASK);
28013244 8B 45 1C         mov         eax,dword ptr [flags] 
28013247 25 FF 00 00 00   and         eax,0FFh 
2801324C 89 45 F4         mov         dword ptr [masked_flags],eax 
 		 	   		  
Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About