How does Nginx look-up cached resource?

Maxim Dounin mdounin at mdounin.ru
Fri Sep 4 18:10:15 UTC 2015


Hello!

On Fri, Sep 04, 2015 at 05:37:14PM +0200, Sergey Brester wrote:

> On 04.09.2015 15:23, Maxim Dounin wrote:
> 
> >On Thu, Sep 03, 2015 at 06:39:49PM -0700, Shuxin Yang wrote:
> >
> >...
> >>If so, how can we guarantee that crc32 and md5 combined can uniquely
> >>identify a resource?
> >
> >We can't. Collisions are unavoidable if you use a hash function
> >with more inputs than outputs. The question is how often
> >collisions are observed in practice.
> 
> Well, but we can (I hope): the original key (not the hash of it, the key
> self, that will be set with `proxy_cache_key`, `fastcgi_cache_key` etc) will
> be saved in header of each cached file (see KEY: ...).
> So it can be validated also direct after entry for hash was found (compare
> original key if hash entry was found).
> In this case if collision for both hash values exists (original key does not
> match) - it should just say - not cached (and later overwrite an "wrong"
> resp. cache entry with "collision" - will very rarely do it).
> 
> In this case it is really safe (but a little bit slower, because each time
> will compare original key also).
> But I hope that work exactly so (I must review the source code), because if
> not - it's very VERY evil.

For sure this is something that can be done.  The question remains 
though: how often collisions are observed in practice, is it make 
sense to do anything additional to protect from collisions and 
spend resources on it?  Even considering only md5, without the 
crc32 check, no practical cases were reported so far.

-- 
Maxim Dounin
http://nginx.org/



More information about the nginx-devel mailing list