limit_req for spiders only

Toni Mueller support-nginx at oeko.net
Mon Oct 14 14:02:39 UTC 2013


Hello,

On Mon, Oct 14, 2013 at 09:25:24AM -0400, Sylvia wrote:
> Doesnt robots.txt "Crawl-Delay" directive satisfy your needs? 

I have it already there, but I don't know how long it takes for such a
directive, or any changes to robots.txt for that matter, to take effect.
Observing the logs, I'd say that this delay between changing robots.txt
and a change in robot behaviour would take several days, as I cannot see
any effects so far.

> Normal spiders should obey robots.txt, if they dont - they can be banned.

Banning Google is not a good idea, no matter how abusive they might be,
and they incidentically operate one of those robots which keep hammering
the site. I'd much prefer a technical solution to enforce such limits,
over convention.

I'd also like to limit the request frequency over an entire pool, so
that I can say "clients from this pool can make requests only with this
fequency, combined, not per client IP", because it doesn't buy me
anything if I can limit the individual search robot to a decent
frequency, but then get hammered by 1000 search robots in parallel, each
one observing the request limit. Right?


Kind regards,
--Toni++



More information about the nginx mailing list