limit_rate based on User-Agent; how to exempt /robots.txt ?
Cameron Kerr
cameron.kerr at otago.ac.nz
Tue Aug 7 22:27:23 UTC 2018
Hi Maxim, that's very helpful...
> -----Original Message-----
> From: nginx [mailto:nginx-bounces at nginx.org] On Behalf Of Maxim Dounin
> On Tue, Aug 07, 2018 at 02:45:02AM +0000, Cameron Kerr wrote:
> > Option 3: (does not work)
> This approach is expected to work fine (assuming you've used limit_req
> somewhere), and I've just tested the exact configuration snipped provided
> to be sure. If it doesn't work for you, the problem is likely elsewhere.
Thank you for the confirmation; I've retried it, and testing with ab, it seems to work, so I'm not sure what I was doing wrong previously.
I like the pattern of chaining maps; its nicely functional in my way of thinking.
For the sake of others, my configuration looks like the following:
http {
map $http_user_agent $user_agent_rate_key {
default "";
"~*(bot[/-]|crawler|robot|spider)" "robot";
"~ScienceBrowser/Nutch" "robot";
"~Arachni/" "robot";
}
map $uri $rate_for_spider_exempting {
default $user_agent_rate_key;
"/robots.txt" '';
}
limit_req_zone $rate_for_spider_exempting zone=per_spider_class:1m rate=100r/m;
limit_req_status 429;
server_tokens off;
server {
limit_req zone=per_spider_class;
location / {
proxy_pass http://routing_layer_http/;
}
}
}
And my testing:
// spider with non-exempted (ie. rate-limited for spiders) URI
$ ab -H 'User-Agent: spider' -n100 https://.../hostname | grep -e '^Complete requests:' -e '^Failed requests:'
Complete requests: 100
Failed requests: 98
// spider with exempted (ie. no-rate-limiting for spiders) URI
$ ab -H 'User-Agent: spider' -n100 https://.../robots.txt | grep -e '^Complete requests:' -e '^Failed requests:'
Complete requests: 100
Failed requests: 0
// non-spider with exempted (ie. rate-limited for spiders) URI
$ ab -n100 https://.../robots.txt | grep -e '^Complete requests:' -e '^Failed requests:'
Complete requests: 100
Failed requests: 0
// non-spider with non-exempted (ie. no-rate-limiting for spiders) URI
$ ab -n100 https://.../hostname | grep -e '^Complete requests:' -e '^Failed requests:'
Complete requests: 100
Failed requests: 0
Thanks again for your feedback
Cheers,
Cameron
More information about the nginx
mailing list