develooper Front page | perl.perl5.porters | Postings from October 2003

Re: Perl 5.8.1, plan C, and algorithmic complexity attacks.

Thread Previous | Thread Next
Nicholas Clark
October 22, 2003 03:35
Re: Perl 5.8.1, plan C, and algorithmic complexity attacks.
Message ID:
On Wed, Oct 22, 2003 at 04:44:08AM -0500, Scott A Crosby wrote:
> On Tue, 21 Oct 2003 22:44:48 +0100, Nicholas Clark <> writes:
> > On Tue, Oct 21, 2003 at 04:26:27PM -0500, Scott A Crosby wrote:
> > 
> > > AFAIK, you switch from the past predictable hash function to a new
> > > randomized one only when you detect that you're under attack. I have a
> > > couple of concerns:
> > > 
> > >   1. How do you dectect 'under attack', and is it possible to
> > >   construct an attack while not triggering the rekey & rehash?
> > 
> I dislike switching algorithms like this proposal because it is too
> easy to make a mistake and not accurately detect when you're under
> attack. For instance, the proposal below has two mistakes, one that
> someone else already noticed, and a different one that I haven't seen
> anyone mention. 
> If you need to do a scheme like this for binary compatibility, I hope
> you rip it out in the next major version and just randomize everything.

So do I, but this is Hugo's call. (I'd not said this yet, but it occurred
to me a few days ago. We should promise that PL_hash_seed will be 0 for
all of 5.8.x and 5.10 if 5.10 maintains binary compatibility. This isn't
quite what Chip proposed.)

> > Currently the criteria is that if the longest linked list in the hash
> > contains >50% of the entries then the rehashing is triggered
> This can be avoided by an attack where the attack inputs hash to two
> distinct values. That way, each chain is only half as long, but we
> avoid the trigger to randomly rekey and rehash. The attack exists with
> its effectiveness reduced by only a half.

I think I said this in that as currently implemented splitting between
3 buckets will get past it. (I didn't say 2 as I couldn't remember whether
it was < or <= and wasn't sure which side the integers would fall).
Currently the cut in logic is all in one if statement. It is easy to change.
I'm not yet sure of what is a better value to change it to
(assuming that it stays there)

My arm wavy guessing makes me think that for a large hash long lists
(relative to the total number of keys) are unlikely by chance, but for a
small hash (ie a large hash as it is being filled) long lists are far more
likely by chance (but will go away as more keys are added). Given that the
re-hashing is one way, doesn't get turned off until the hash is emptied,
and is a possible speed hit, if my guessing is correct then we don't want
to turn on rehashing too early.

> > (strictly this is at a hash split, when the number of buckets is doubled,
> > and it is skipped if the longest linked is < 8, or we're already on
> > the alternative hashing function)
> This check is also easy to avoid. If you only check 'under attack'
> when the number of keys doubles during a rehash, the attacker can
> 'preload' the hash table with $2^n+3$ keys, perfectly distributed
> among all hash values, then send $2^n-3$ attack keys. Since the hash
> table won't rehash until $2*2^n$ keys are inserted, it won't notice
> its under attack until the $2^n-3$ are all inserted, costing quadratic
> time. Thus, the attack still exists; its effectiveness is only reduced
> by three quarters.

Not thought of that.

> If you go this way where you change the hash function when you detect
> attack, you need to do that check for too long chains on every insert.

I'd already thought (can't remember if I'd said) that I'd like to change the
check to be on the length of any linked list exceeding some expected value,
so that it is checked on every new insert. It is trivially cheap to count
along this list as part of the existing scan, and overlong lists (rather than
any other metric) is the real worst case scenario.

I concur that doing the check this way would beat your new attack.

The important question is then what is probability of random data generating
a chain of length n. (which I suspect is also a function of number of hash

Then I guess we pick a probability at which we cut in, find the length
corresponding to that probability, and hard code that length. (Or some
simple formula that gives it - we only need to calculate it when we lengthen
a list)

I suspect that my maths isn't good enough to do this calculation reliably.

> > More maths or experimentation on what the "attack" criteria should be is
> > welcome.
> Well, there's always regexp DoS's. :) PCRE has a way to limit its node
> expansiosn before it gives up, as a brake on this sort of attack.

Sounds good, but I have no idea how perl's regexp engine works.
Hugo has re-writing it to avoid C stack recursion as part of his plans for
5.10. He'd have a much better idea about that than I would.

Thanks for the advice. It's useful.

Nicholas Clark

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About