develooper Front page | perl.perl5.porters | Postings from April 2014

hash table security confusion

Thread Next
Reini Urban
April 30, 2014 19:06
hash table security confusion
Message ID:
With more and more surprise I'm seeing work being done on the wrong
solutions for a simple perl5 hash table problem.

I've wrote a blog post but nobody seems to be able to understand it.
I've added several git repos and one with a huge description of the
problem and solutions. Again nobody is able to understand the problem
and the solutions.
I see my fixes being added without citing me. And the typical name
calling and even worse.

Again. Our worst case we need to be protected against is the collision
attack. Searching a linear list of colliding buckets deviates to
O(n/2). Furthermore searching linear lists is not cache friendly.
That's why most others switched to open-addressing. With collision
attacks open-addressing deviates to rehashing.

Our avg case is now being slowed down even more because Yves thinks
that only using more secure hash function will help against the linked
list attack.
He doesn't find the idea attractive to use a sorted list, or a tree,
or universal hashing or maybe double hashing, which also protects
against such collision attacks somewhat. Not fully though. And I don't
believe in uhash that much, because this info could be exposed also.

Securing the hash function does not help at all against collisions.
There will always be collisions, and since they can be precomputed and
since several bits of the random seed can be broken by exposing too
much knowledge (see several papers online), or brute-forced it does
not help at all.
The typical largest hash tables with collision impact uses 15bit.
Brute forcing 15bits is a matter of hours. No need to call the NSA or
start your massive GPU cluster. No need to use even more secure and
slower hash functions when only the last 15 bit matter.

The random seed matters. Not exposing the seed and sort-order matters.
But over all the collision search matters.

Forget aes or siphash. Use a simple and fast hash function to help the
avg case, and use proper collision strategies to fight the attack.

Even if CRC32 is trivial to break - (the easiest) - it is still the
best solution on core2 CPU's in my tests.

And the current hv.c is such a big mess, that it hurts the eyes:
slow and insecure by design,
he "hash entry" (to traverse lists and hold a val) vs hek (+ "key"),
shared_he for the global strtab, refcounted_he are different
structures and of course code duplication,
abnormal slow collision searching (four compares instead of one),
cache unfriendly to the extreme,
iters broken by design.

What a mess. I never saw so terrible code, not even in PHP.
Larry's original code looked good though.

python got their double hashing pretty nice, only a bit awkward.

ruby uses 1985-style linked lists, but still maintainable.
only linked lists but at least sane code and fast comparisons.

php hashes are also similar to ruby:

In all langs no attempts for hash thread-safety, no optimizations, no
security other than random seeds. At least python uses a simple double
hashing which offers basic security, but not a good one.
Better implementations are to be found in the linux kernel, glibc,
libliberty, gcc and most self-respecting languages.

I'm slowly rewriting the mess. The usual unfriendly replies are filtered.
Reini Urban

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About