Two type of rate limiting for based on IP address

Antonio P.P. Almeida appa at perusio.net
Mon Mar 12 14:34:11 UTC 2012


> Hi all,
>
>  How can I maintain two rate limit strategies?One for spiders and one for
> regular users?
>
> I can get the IP address list of spiders from
>
> http://www.iplists.com/ . Can I separate it by geo? Have people attempted
> this?
>
> My website is being pounded by some screen scrapers and I want to block
> them, but not at the risk of blocking search engine spiders.

Do you understand that by going that way, regular users will be subject to
the same request limiting of the bad spiders? You can try to do further
selection on UA, but bad spiders have the habit of providing bogus UA
strings.

At the http level:

geo $good_spider {
    default 0;
    #list all good spider IPs
}

limit_req_zone $binary_remote_addr zone=bad_spiders:10m rate=1r/s;

On the vhost (server level):

location / {
    limit_req zone=bad_spiders burst=5;

    error_page 418 @good-spiders;

    if ($good_spider) {
        return 418;
    }
    #...
}

location @good-spiders {
    # no limits here
    #...
}

--appa



More information about the nginx mailing list