How does Nginx look-up cached resource?

Maxim Dounin mdounin at mdounin.ru
Sun Sep 6 00:08:03 UTC 2015


Hello!

On Fri, Sep 04, 2015 at 11:00:58PM +0200, Sergey Brester wrote:

> On 04.09.2015 21:43, Maxim Dounin wrote:
> 
> >No one yet happened. And likely won't ever happen, as md5 is a
> >good hash function 128 bits wide, and it took many years to find
> >even a single collision of md5.
> 
> You confuse good for "collision-search algorithms" with a good in the sense
> of the "probability the collision can occur". A estimation of collision in
> sence of "collision-search algorithm" and co. implies the hashed string is
> unknown and for example it estimates attacks to find that (like brute,
> chosen-prefix etc).
> 
> I'm talking about the probability of incidence the same hash for two
> different cache keys.
> In addition, because of so-called birthday problem
> (https://en.wikipedia.org/wiki/Birthday_problem) we can increase this
> probability with at least comparable 64 bit for real random data (different
> length).

Well, not, I don't confuse anything.  For sure, brute force attack 
on a 128 bit hash requires approximately 2^64 attempts.

That is, a single nginx instance with 2^64 cached resources will 
likely show up a collision.  But that's not a number of resources 
you'll be able to store on a single node - in particular, because 
64-bit address space wouldn't be enough to address that many 
cached items.

To obtain a collision of a 128-bit hash with at least 1% 
probability, you'll need more than 10^18 resources cached on a 
single node, which is not even close to a something possible as 
well.

Assuming 1 billion of keys (which is way more than a single nginx 
node can handle now, and will require about 125G of memory for a 
cache keys zone), probability of a collision is less than 10^(-20).

Quoting https://en.wikipedia.org/wiki/Birthday_attack:

: For comparison, 10^(−18) to 10^(−15) is the uncorrectable bit 
: error rate of a typical hard disk.

That is, you are trying to fight a problem which is less probable 
than the chance that you'll get wrong data from your hard disk.

> Don't forget our keys, that will be hashed, are not really any "random" data
> - most of the time it contains only specified characters and/or has
> specified length.
> 
> So the probability that the collision will occur is still significant larger
> (a billion billion times larger).

This is not true as long as you are using a good enough hash 
function.  See https://en.wikipedia.org/wiki/Hash_function#Uniformity 
for details.

> >And even if it'll happen, we have
> >crc32 check in place to protect us.
> 
> Very funny... You make such conclusions based on what?
> So last but not least, if you still haven't seen the collision in sence of
> md5 "protected" crc32, how can you be sure, that this is still not occurred?
> 
> For example, how large you will estimate the probability that the collision
> will occur, if my keys will contain only exact 32 characters in range
> [0-9A-Za-z]? And it frequency? Just approximately dimension...

See above for detailed numbers.

-- 
Maxim Dounin
http://nginx.org/



More information about the nginx-devel mailing list