Can nginx handle/cache this robot handling case?

Ian Evans ianevans at digitalhit.com
Mon Jun 24 07:45:33 UTC 2013


Hi everyone.

First some background. I'm trying to integrate the method used by 
Pixabay to handle Google Image Search's new design which makes it very 
easy (one button click) for visitors to see an image outside of the 
site's context. This has greatly slammed many sites' traffic and income.

This is how Pixabay got nginx to handle hijacking the button so the 
image can be seen in the site's context:

"Hotlinking protection and watermarking for Google Images": 
(http://pixabay.com/en/blog/posts/hotlinking-protection-and-watermarking-for-google-32/)

Part of the method is using what they call "trap URLs". i.e. adding "?i" 
to img URLs in the source when the page is seen by a human. Bots like 
googlebot don't see the ?i and so they get handled differently by nginx:

if ($args = "i") { set $watermark 0; }
if ($watermark = 1) {
    add_header Cache-Control "no-cache, must-revalidate";
//   rewrite "IMAGE_URL_REGEX" WATERMARK_URL last; <-optional serving of 
a watermarked version of image.
}


Pixabay handles creating the "?i" addendum to the img tag in their 
templates. I was looking for a method that was a little more 
caching/performance friendly, so I suggested using jQuery to append the 
"?i" to browsers at the document ready stage so bots would not get the 
?i and the page could still be cached because the "?i" was added on the 
client side. e.g.

var img= $("img.myimg");
img.attr("src", img.attr("src")+"?i");

The only problem is that the page now hits the server twice. Once when 
loading /the/file/location/img.jpg and a second time after jQuery 
changes it to loading /the/file/location/img.jpg?i

It was suggested that I could potentially add Varnish to my stack and 
strip out the ?i from the URLs for bots then but I didn't want to add 
something else to the stack

Can nginx and, say, the fastcgi cache (which I use) handle this 
situation natively? Let's say all pages have the "?i" at the end of 
their image locations so they're already there for the majority (human 
traffic). Is there an efficient way for nginx, upon detecting a bot 
agent, to strip the ?i (perhaps with 
http://wiki.nginx.org/HttpSubsModule), serve and gzip and cache that 
version, while serving/caching the original version to browsers?

It's a long post. (!) I just have an inkling that my fave server can 
handle this. Just don't have the experience to configure it.

Thanks for any insight.



More information about the nginx mailing list