Serve *only* from cache for particular user-agents

rge3 nginx-forum at nginx.us
Fri Feb 21 15:25:58 UTC 2014


I havne't found any ideas for this and thought I might ask here.  We have a
fairly straightforward proxy_cache setup with a proxy_pass backend.  We
cache documents for different lengths of time or go the backend for what's
missing.  My problem is we're getting overrun with bot and spider requests. 
MSN in particular started hitting us exceptionally hard yesterday and
started bringing our backend servers down.  Because they're crawling the
site from end to end our cache is missing a lot of those pages and nginx has
to pass the request on through.

I'm looking for a way to match on User-Agent and say that if it matches
certain bots to *only* serve out of proxy_cache.  Ideally I'd like the logic
to be:  if it's in the cache, serve it.  If it's not, then return some 4xx
error.  But in the case of those user-agents, *don't* go to the backend. 
Only give them cache.  My first thought was something like...

if ($http_user_agent ~* msn-bot) {
      proxy_pass http://devnull;
 }

by making a bogus backend.  But in nginx 1.4.3 (that's what we're running) I
get
nginx: [emerg] "proxy_pass" directive is not allowed here

Does anyone have another idea?

Thanks,
-Rick

Posted at Nginx Forum: http://forum.nginx.org/read.php?2,247837,247837#msg-247837



More information about the nginx mailing list