Can nginx handle/cache this robot handling case?
Ian Evans
ianevans at digitalhit.com
Mon Jun 24 07:45:33 UTC 2013
Hi everyone.
First some background. I'm trying to integrate the method used by
Pixabay to handle Google Image Search's new design which makes it very
easy (one button click) for visitors to see an image outside of the
site's context. This has greatly slammed many sites' traffic and income.
This is how Pixabay got nginx to handle hijacking the button so the
image can be seen in the site's context:
"Hotlinking protection and watermarking for Google Images":
(http://pixabay.com/en/blog/posts/hotlinking-protection-and-watermarking-for-google-32/)
Part of the method is using what they call "trap URLs". i.e. adding "?i"
to img URLs in the source when the page is seen by a human. Bots like
googlebot don't see the ?i and so they get handled differently by nginx:
if ($args = "i") { set $watermark 0; }
if ($watermark = 1) {
add_header Cache-Control "no-cache, must-revalidate";
// rewrite "IMAGE_URL_REGEX" WATERMARK_URL last; <-optional serving of
a watermarked version of image.
}
Pixabay handles creating the "?i" addendum to the img tag in their
templates. I was looking for a method that was a little more
caching/performance friendly, so I suggested using jQuery to append the
"?i" to browsers at the document ready stage so bots would not get the
?i and the page could still be cached because the "?i" was added on the
client side. e.g.
var img= $("img.myimg");
img.attr("src", img.attr("src")+"?i");
The only problem is that the page now hits the server twice. Once when
loading /the/file/location/img.jpg and a second time after jQuery
changes it to loading /the/file/location/img.jpg?i
It was suggested that I could potentially add Varnish to my stack and
strip out the ?i from the URLs for bots then but I didn't want to add
something else to the stack
Can nginx and, say, the fastcgi cache (which I use) handle this
situation natively? Let's say all pages have the "?i" at the end of
their image locations so they're already there for the majority (human
traffic). Is there an efficient way for nginx, upon detecting a bot
agent, to strip the ?i (perhaps with
http://wiki.nginx.org/HttpSubsModule), serve and gzip and cache that
version, while serving/caching the original version to browsers?
It's a long post. (!) I just have an inkling that my fave server can
handle this. Just don't have the experience to configure it.
Thanks for any insight.
More information about the nginx
mailing list