develooper Front page | perl.perl5.porters | Postings from July 2014

Re: what evidence means, and actual complexity attack examples

Thread Previous | Thread Next
From:
Reini Urban
Date:
July 2, 2014 16:17
Subject:
Re: what evidence means, and actual complexity attack examples
Message ID:
CAHiT=DF=ba9SFriJ0GsRprFUAtNUJuoLttLcubiGceeT+Js=0A@mail.gmail.com
David,

Since Marc was mostly reiterating my points and you still haven't
understood it, and I still no sign of any committer to have understood
the issues, I have to summarize it again (the fifth time?)

Yves did a fine job with improving two parts of hash security with 5.18.
But in the end he overdid it, caused by not understanding how many
bits of a hash function are used in a hash table.

The biggest improvement was randomizing iteration with a different
seed than the hash seed.

Another fix was reinstating the 5.8.0 rehashing approach, because the
5.8.1 rehashing optimization didn't work for simple \0 collision
attacks. But adding the len to the hash does not help. This is
trivially exploited as there is no limited key length.

Another change was to use Jenkins real but also too old OOAT hash
function with more turbulences at the end, not the bastard we used for
years. Jenkins never wrote the function we used. Technically our
bastard worked well enough, and using the real OOAT didn't help for
the simple \0 attack.

So he searched for more secure hash functions not understanding that
hash functions used in hash tables are per definition insecure. I
didn't see any other committer or so called "security expert" at p5p
understanding this important point (besides me, Marc and Andi Koenig).

hv.c:631
       entry = (HvARRAY(hv))[hash & (I32) HvMAX(hv)];

The hash function returns a 32 bit value which is AND'ed to the size
of the hash array HvMAX, which is of the form 2*n-1, so always ends
with 1 bits. The most common hash tables sizes are 7 or 63, and the
security impact is independent of the hash table size, it is only the
number of collisions which can be created at one bucket.
So with a hash table size of 63 only the last 111111 bits (7 bits) of
the hash functions are used, the rest at the front is thrown away. At
max we use 32bits of the hash function, because this is our max size.
A secure hash function which helps from preventing collisions starts
at 256 bits, though earlier failed attempts started with 64 and 128
bits. 32 was never thought of "secure", finding collisions by brute
force with 32bit hashes are trivial to generate.

Any talk about secure hash functions to be used in hash tables is wrong.
All the fuzz about the new hv_func.c choice is nonsense, we don't need
secure hash functions, as they are not used securely (AND'ed away), we
need good hash tables.

A good hash function only helps in the avg case, so I studied it at
https://github.com/rurban/perl-hash-stats
The bad case, an attack, cannot be helped by the hash function, only a
zero-invariant function should be used, otherwise you just attack it
by "a".\0 x $i++ (I called it "Ruslan's attack"), which collides to
the same hash on a simple hash function.
The bad cases can be avoided by the measures we already use.

We choose a random seed over uhash in 2002 when the attack was
published and uhash was recommended by the attack author. This seed
helped until major services were easily attacked (not ours because we
had the random seed). Personally I chose the good old 5.8.0 rehashing
over the the 5.8.2 optimization for our cPanel 5.6.2 hashes long time
before p5p found out about the rehashing problem from 5.8.2 - 5.16.

To prevent algorithmic attacks you need to protect the seed (which
Yves did now) and/or use different hash table algorithms, which are
faster and not attackable in this way. Or use simplier ways, as PHP
with MAX_POST_SIZE or rate throttling in djbdns. p5p chose not to
investigate, because they had no time, and probably our hash table
abstractions suck and gets worse and worse.
The argument that rehashing the buckets in our linked list is "better"
than with most other hash tables is not correct. Only double hashing
has slower rehashing, but faster hash access overall.

I tried to rewrite at least the bucket search part, because this was
the most horrible part (4 cmp instead of one), but had to find out
that is used in 5 different parts, where normally you need it only in
one.

for (; entry; entry = HeNEXT(entry)) {
  if (HeHASH(entry) != hash) /* strings can't be equal */
   continue;
  if (HeKLEN(entry) != (I32)klen)
   continue;
  if (HeKEY(entry) != key && memNE(HeKEY(entry),key,klen)) /* is this it? */
   continue;
  if ((HeKFLAGS(entry) ^ masked_flags) & HVhek_UTF8)
   continue;
=>
Create a hecmp sentinel in advance and do a single memNE(he, &hecmp,
helen); in the loop.

But based on incompetent and abusive feedback (and horrible code) I gave up.
In the end we have now more secure hashes, which are much slower as
needed. And we have committers and implementers who still have no idea
about how hash functions work, how a good hash table works and who are
still hostile against critics.
Who could provide better implementations but are scared away by a
circus of loud, arrogant and incompetent p5p followers. ...
And also criticising a commit will lead to even worse commit
followups, so it's better to stay silent to avoid even more harm done.
Get your community under control and maybe the technical parts will
get better.

https://github.com/rurban/perl-hash-stats
https://github.com/rurban/perl/commits/rurban/hash-sortbuckets
https://github.com/rurban/smhasher

-- 
Reini Urban
http://cpanel.net/   http://www.perl-compiler.org/

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About