David, Since Marc was mostly reiterating my points and you still haven't understood it, and I still no sign of any committer to have understood the issues, I have to summarize it again (the fifth time?) Yves did a fine job with improving two parts of hash security with 5.18. But in the end he overdid it, caused by not understanding how many bits of a hash function are used in a hash table. The biggest improvement was randomizing iteration with a different seed than the hash seed. Another fix was reinstating the 5.8.0 rehashing approach, because the 5.8.1 rehashing optimization didn't work for simple \0 collision attacks. But adding the len to the hash does not help. This is trivially exploited as there is no limited key length. Another change was to use Jenkins real but also too old OOAT hash function with more turbulences at the end, not the bastard we used for years. Jenkins never wrote the function we used. Technically our bastard worked well enough, and using the real OOAT didn't help for the simple \0 attack. So he searched for more secure hash functions not understanding that hash functions used in hash tables are per definition insecure. I didn't see any other committer or so called "security expert" at p5p understanding this important point (besides me, Marc and Andi Koenig). hv.c:631 entry = (HvARRAY(hv))[hash & (I32) HvMAX(hv)]; The hash function returns a 32 bit value which is AND'ed to the size of the hash array HvMAX, which is of the form 2*n-1, so always ends with 1 bits. The most common hash tables sizes are 7 or 63, and the security impact is independent of the hash table size, it is only the number of collisions which can be created at one bucket. So with a hash table size of 63 only the last 111111 bits (7 bits) of the hash functions are used, the rest at the front is thrown away. At max we use 32bits of the hash function, because this is our max size. A secure hash function which helps from preventing collisions starts at 256 bits, though earlier failed attempts started with 64 and 128 bits. 32 was never thought of "secure", finding collisions by brute force with 32bit hashes are trivial to generate. Any talk about secure hash functions to be used in hash tables is wrong. All the fuzz about the new hv_func.c choice is nonsense, we don't need secure hash functions, as they are not used securely (AND'ed away), we need good hash tables. A good hash function only helps in the avg case, so I studied it at https://github.com/rurban/perl-hash-stats The bad case, an attack, cannot be helped by the hash function, only a zero-invariant function should be used, otherwise you just attack it by "a".\0 x $i++ (I called it "Ruslan's attack"), which collides to the same hash on a simple hash function. The bad cases can be avoided by the measures we already use. We choose a random seed over uhash in 2002 when the attack was published and uhash was recommended by the attack author. This seed helped until major services were easily attacked (not ours because we had the random seed). Personally I chose the good old 5.8.0 rehashing over the the 5.8.2 optimization for our cPanel 5.6.2 hashes long time before p5p found out about the rehashing problem from 5.8.2 - 5.16. To prevent algorithmic attacks you need to protect the seed (which Yves did now) and/or use different hash table algorithms, which are faster and not attackable in this way. Or use simplier ways, as PHP with MAX_POST_SIZE or rate throttling in djbdns. p5p chose not to investigate, because they had no time, and probably our hash table abstractions suck and gets worse and worse. The argument that rehashing the buckets in our linked list is "better" than with most other hash tables is not correct. Only double hashing has slower rehashing, but faster hash access overall. I tried to rewrite at least the bucket search part, because this was the most horrible part (4 cmp instead of one), but had to find out that is used in 5 different parts, where normally you need it only in one. for (; entry; entry = HeNEXT(entry)) { if (HeHASH(entry) != hash) /* strings can't be equal */ continue; if (HeKLEN(entry) != (I32)klen) continue; if (HeKEY(entry) != key && memNE(HeKEY(entry),key,klen)) /* is this it? */ continue; if ((HeKFLAGS(entry) ^ masked_flags) & HVhek_UTF8) continue; => Create a hecmp sentinel in advance and do a single memNE(he, &hecmp, helen); in the loop. But based on incompetent and abusive feedback (and horrible code) I gave up. In the end we have now more secure hashes, which are much slower as needed. And we have committers and implementers who still have no idea about how hash functions work, how a good hash table works and who are still hostile against critics. Who could provide better implementations but are scared away by a circus of loud, arrogant and incompetent p5p followers. ... And also criticising a commit will lead to even worse commit followups, so it's better to stay silent to avoid even more harm done. Get your community under control and maybe the technical parts will get better. https://github.com/rurban/perl-hash-stats https://github.com/rurban/perl/commits/rurban/hash-sortbuckets https://github.com/rurban/smhasher -- Reini Urban http://cpanel.net/ http://www.perl-compiler.org/Thread Previous | Thread Next