How does Nginx look-up cached resource?
Maxim Dounin
mdounin at mdounin.ru
Sun Sep 6 00:08:03 UTC 2015
Hello!
On Fri, Sep 04, 2015 at 11:00:58PM +0200, Sergey Brester wrote:
> On 04.09.2015 21:43, Maxim Dounin wrote:
>
> >No one yet happened. And likely won't ever happen, as md5 is a
> >good hash function 128 bits wide, and it took many years to find
> >even a single collision of md5.
>
> You confuse good for "collision-search algorithms" with a good in the sense
> of the "probability the collision can occur". A estimation of collision in
> sence of "collision-search algorithm" and co. implies the hashed string is
> unknown and for example it estimates attacks to find that (like brute,
> chosen-prefix etc).
>
> I'm talking about the probability of incidence the same hash for two
> different cache keys.
> In addition, because of so-called birthday problem
> (https://en.wikipedia.org/wiki/Birthday_problem) we can increase this
> probability with at least comparable 64 bit for real random data (different
> length).
Well, not, I don't confuse anything. For sure, brute force attack
on a 128 bit hash requires approximately 2^64 attempts.
That is, a single nginx instance with 2^64 cached resources will
likely show up a collision. But that's not a number of resources
you'll be able to store on a single node - in particular, because
64-bit address space wouldn't be enough to address that many
cached items.
To obtain a collision of a 128-bit hash with at least 1%
probability, you'll need more than 10^18 resources cached on a
single node, which is not even close to a something possible as
well.
Assuming 1 billion of keys (which is way more than a single nginx
node can handle now, and will require about 125G of memory for a
cache keys zone), probability of a collision is less than 10^(-20).
Quoting https://en.wikipedia.org/wiki/Birthday_attack:
: For comparison, 10^(−18) to 10^(−15) is the uncorrectable bit
: error rate of a typical hard disk.
That is, you are trying to fight a problem which is less probable
than the chance that you'll get wrong data from your hard disk.
> Don't forget our keys, that will be hashed, are not really any "random" data
> - most of the time it contains only specified characters and/or has
> specified length.
>
> So the probability that the collision will occur is still significant larger
> (a billion billion times larger).
This is not true as long as you are using a good enough hash
function. See https://en.wikipedia.org/wiki/Hash_function#Uniformity
for details.
> >And even if it'll happen, we have
> >crc32 check in place to protect us.
>
> Very funny... You make such conclusions based on what?
> So last but not least, if you still haven't seen the collision in sence of
> md5 "protected" crc32, how can you be sure, that this is still not occurred?
>
> For example, how large you will estimate the probability that the collision
> will occur, if my keys will contain only exact 32 characters in range
> [0-9A-Za-z]? And it frequency? Just approximately dimension...
See above for detailed numbers.
--
Maxim Dounin
http://nginx.org/
More information about the nginx-devel
mailing list