Help: How to deal with content scrapers?
konfoo at gmail.com
Thu Apr 23 04:44:21 MSD 2009
On Wed, Apr 22, 2009 at 5:17 PM, davidr <nginx-forum at nginx.us> wrote:
> What's the best way to limit the number of requests an IP can make in a, say 15 min, time period, for example? Is there a way to block them on a webserver (nginx) layer and move it away from an application layer since app layer blocking incurs too much of a performance hit? I'm looking for something that would simply count for the number of requests over a particular time period and just add the IP to iptables if it ever crosses the limit.
You could try fail2ban - it's pretty easy to build rules for it.
The trick is that you don't want to have it monitoring your main nginx
log. So the solution is to place links to bogus URLs in your html
pages which are invisible to a human. This way when their scraper
attempts to hit the bogus links, nginx will trigger entries into your
errorlog, and your fail2ban monitor will catch x amount of those in y
amount of time and block the host in iptables.
More information about the nginx