Token bucket to limit bots and site grabbers

Maxim Dounin mdounin at mdounin.ru
Mon Feb 15 14:33:58 MSK 2010


Hello!

On Mon, Feb 15, 2010 at 11:34:16AM +0100, Tobia Conforto wrote:

> Is there any module I can use to limit or deny access to bots 
> and site grabbers, based on the long-term request rate?
> 
> I'm thinking of a token bucket with a timeframe of hours or 
> days, where a legitimate user will only download, say, 50 pages 
> (images and css excluded) per day, from a single ip address. 
> Bots will obviously try and grab more content than that. Even if 
> they set a long delay between requests, the overall number of 
> requests per day will be much higher than that of a legitimate 
> user.
> 
> limit_req is not what I'm looking for, because it has a short 
> timeframe of seconds or minutes, and because this kind of limit 
> requires a token bucket, not a leaky bucket.

To turn limit_req into token bucket it's enough to specify 
"nodelay" flag.

It should be relatively easy to extend supported time frames, too.  
Not as easy as just adding another line of configuration parsing, 
but I believe it's something that should be done.

Maxim Dounin



More information about the nginx mailing list