On Wed, 22 Oct 2003 11:29:17 +0100, Nicholas Clark <nick@ccl4.org> writes: > > If you need to do a scheme like this for binary compatibility, I hope > > you rip it out in the next major version and just randomize everything. > > So do I, but this is Hugo's call. (I'd not said this yet, but it occurred > to me a few days ago. We should promise that PL_hash_seed will be 0 for > all of 5.8.x and 5.10 if 5.10 maintains binary compatibility. This isn't > quite what Chip proposed.) Well, I'm glad I'm not maintaining it. I do think that the scheme is will work securily if the two problems get fixed. > > > Currently the criteria is that if the longest linked list in the hash > > > contains >50% of the entries then the rehashing is triggered > > > > This can be avoided by an attack where the attack inputs hash to two > > distinct values. That way, each chain is only half as long, but we > > avoid the trigger to randomly rekey and rehash. The attack exists with > > its effectiveness reduced by only a half. > > I think I said this in that as currently implemented splitting between > 3 buckets will get past it. (I didn't say 2 as I couldn't remember whether > it was < or <= and wasn't sure which side the integers would fall). I'd do it by sending 2% random keys, first, then alternate between keys with a hash value of 0 and with a hash value of 1 until I created the other 98%. This new attack avoids the defense with only a 50% degradation in efficiency. Splitting between 3 different hash values like you suggest is a third less effective than this attack at 33%. > My arm wavy guessing makes me think that for a large hash long lists > (relative to the total number of keys) are unlikely by chance, but for a > small hash (ie a large hash as it is being filled) long lists are far more > likely by chance (but will go away as more keys are added). > The important question is then what is probability of random data generating > a chain of length n. (which I suspect is also a function of number of hash > entries) I believe the variance in chain length decreases as the number of keys and buckets increases. I used to know how to do this in closed form. My intuition says that if the number of buckets and keys is $n$, then the probability of a chain of length $k, k>2$ becomes exponentially less likely as k increases. Thus for a good test I'd suggest something like: 5 * log_2 (n) Or, we find someone to look up the formula. :) > > If you go this way where you change the hash function when you detect > > attack, you need to do that check for too long chains on every insert. > > I'd already thought (can't remember if I'd said) that I'd like to change the > check to be on the length of any linked list exceeding some expected value, > so that it is checked on every new insert. It is trivially cheap to count > along this list as part of the existing scan, and overlong lists (rather than > any other metric) is the real worst case scenario. Yup. > I concur that doing the check this way would beat your new attack. Yup. If you make these changes to Plan C, I think you will again be secure. ScottThread Previous | Thread Next