Serve *only* from cache for particular user-agents

rge3 nginx-forum at nginx.us
Fri Feb 21 16:46:02 UTC 2014


Maxim Dounin Wrote:
-------------------------------------------------------
> Hello!
> 
> On Fri, Feb 21, 2014 at 10:25:58AM -0500, rge3 wrote:
> 
> > I havne't found any ideas for this and thought I might ask here.  We
> have a
> > fairly straightforward proxy_cache setup with a proxy_pass backend. 
> We
> > cache documents for different lengths of time or go the backend for
> what's
> > missing.  My problem is we're getting overrun with bot and spider
> requests. 
> > MSN in particular started hitting us exceptionally hard yesterday
> and
> > started bringing our backend servers down.  Because they're crawling
> the
> > site from end to end our cache is missing a lot of those pages and
> nginx has
> > to pass the request on through.
> > 
> > I'm looking for a way to match on User-Agent and say that if it
> matches
> > certain bots to *only* serve out of proxy_cache.  Ideally I'd like
> the logic
> > to be:  if it's in the cache, serve it.  If it's not, then return
> some 4xx
> > error.  But in the case of those user-agents, *don't* go to the
> backend. 
> > Only give them cache.  My first thought was something like...
> > 
> > if ($http_user_agent ~* msn-bot) {
> >       proxy_pass http://devnull;
> >  }
> > 
> > by making a bogus backend.  But in nginx 1.4.3 (that's what we're
> running) I
> > get
> > nginx: [emerg] "proxy_pass" directive is not allowed here
> > 
> > Does anyone have another idea?
> 
> The message suggests you are trying to write the snippet above at 
> server{} level.  Moving things into a location should do the 
> trick.
> 
> Please make sure to read http://wiki.nginx.org/IfIsEvil though.

That seems to have done it!  With a location block I now have...

                       location / {
                                proxy_cache_valid  200 301 302  30m;

                                if ($http_user_agent ~* msn-bot) {
                                        proxy_pass http://devnull;
                                }

                                if ($http_user_agent !~* msn-bot) {
                                        proxy_pass http://productionrupal;
                                }
                        }

That seems to work perfectly.  But is it a safe use of "if"?  Is there a
safer way to do it without an if?

Thanks for the help!
-R

Posted at Nginx Forum: http://forum.nginx.org/read.php?2,247837,247845#msg-247845



More information about the nginx mailing list