# How does Nginx look-up cached resource?

Maxim Dounin mdounin at mdounin.ru
Sun Sep 6 00:08:03 UTC 2015

```Hello!

On Fri, Sep 04, 2015 at 11:00:58PM +0200, Sergey Brester wrote:

> On 04.09.2015 21:43, Maxim Dounin wrote:
>
> >No one yet happened. And likely won't ever happen, as md5 is a
> >good hash function 128 bits wide, and it took many years to find
> >even a single collision of md5.
>
> You confuse good for "collision-search algorithms" with a good in the sense
> of the "probability the collision can occur". A estimation of collision in
> sence of "collision-search algorithm" and co. implies the hashed string is
> unknown and for example it estimates attacks to find that (like brute,
> chosen-prefix etc).
>
> I'm talking about the probability of incidence the same hash for two
> different cache keys.
> In addition, because of so-called birthday problem
> (https://en.wikipedia.org/wiki/Birthday_problem) we can increase this
> probability with at least comparable 64 bit for real random data (different
> length).

Well, not, I don't confuse anything.  For sure, brute force attack
on a 128 bit hash requires approximately 2^64 attempts.

That is, a single nginx instance with 2^64 cached resources will
likely show up a collision.  But that's not a number of resources
you'll be able to store on a single node - in particular, because
cached items.

To obtain a collision of a 128-bit hash with at least 1%
probability, you'll need more than 10^18 resources cached on a
single node, which is not even close to a something possible as
well.

Assuming 1 billion of keys (which is way more than a single nginx
node can handle now, and will require about 125G of memory for a
cache keys zone), probability of a collision is less than 10^(-20).

Quoting https://en.wikipedia.org/wiki/Birthday_attack:

: For comparison, 10^(−18) to 10^(−15) is the uncorrectable bit
: error rate of a typical hard disk.

That is, you are trying to fight a problem which is less probable
than the chance that you'll get wrong data from your hard disk.

> Don't forget our keys, that will be hashed, are not really any "random" data
> - most of the time it contains only specified characters and/or has
> specified length.
>
> So the probability that the collision will occur is still significant larger
> (a billion billion times larger).

This is not true as long as you are using a good enough hash
function.  See https://en.wikipedia.org/wiki/Hash_function#Uniformity
for details.

> >And even if it'll happen, we have
> >crc32 check in place to protect us.
>
> Very funny... You make such conclusions based on what?
> So last but not least, if you still haven't seen the collision in sence of
> md5 "protected" crc32, how can you be sure, that this is still not occurred?
>
> For example, how large you will estimate the probability that the collision
> will occur, if my keys will contain only exact 32 characters in range
> [0-9A-Za-z]? And it frequency? Just approximately dimension...

See above for detailed numbers.

--
Maxim Dounin
http://nginx.org/

```