rate limit with good bot IPs whitelisted

neubyr neubyr at gmail.com
Sat Nov 22 17:42:45 UTC 2014


Thank you Oleksandr!!

On Sat, Nov 22, 2014 at 7:33 AM, Oleksandr V. Typlyns'kyi <
wangsamp at gmail.com> wrote:

> Yesterday Nov 21, 2014 at 20:07 neubyr wrote:
>
> > I am trying to figure out if there is any way to rate limit all traffic
> > except Googlebot, msnbot, yandex and baidu bots. Here is what I have
> > started with:
> >
> >   # Whitelisted IPs
> >   geo $rate_limit_ip {
> >       default $binary_remote_addr;
> >       127.0.0.1 "";
> >       10.0.0.0/8 "";
> >   }
> >
> >   # Rate limit
> >   limit_req_zone $rate_limit_ip zone=publix:10m rate=10r/s;
>
>  It will not work as you expect.
>  Geo does not support variables in values.
>  You need something like this:
>  geo $whitelist {
>      default 0;
>      127.0.0.1 1;
>      ...
>  }
>  map $whitelist $rate_limit_ip {
>      default $binary_remote_addr;
>      1       "";
>  }
>
>
I am not sure how, but it's working only with geo defining IP addresses. I
can see HTTP 503 on client side and also 'limiting requests, excess: 10.033
by zone' in error logs. Nginx version: nginx/1.6.0

    geo $rate_limit_ip {
        default $binary_remote_addr;
        127.0.0.1 1;
        10.0.0.0/8 1;
    }


> > I can add googlebot, msnbot, yandex and baidu IP ranges manually to the
> > whitelist, but that will make lookup table big. I am not sure whether
> > this approach will work for high traffic like - 1200 requests/second
> > distributed across 20 nginx hosts. Any ideas on such setup will be
> > really helpful.
>
>   Nginx parses and loads this data into radix tree in memory on startup.
>
> > Also, can such host lookups be done in real-time for every request? I am
> > guessing that may not be efficient for each request, but I was wondering
> if
> > there are any solutions.
>
>   All variables are evaluated when they are used in request.
>
>
I was wondering if remote ip's hostname lookup can be done before
rate-limiting it. For example, I don't want to block IPs coming from
baidu.com. Can I do such IP-hostname lookup before rate-limiting? Will it
efficient or what are other options?

Thanks again for detailed reply.

- N
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20141122/a8c9a1cb/attachment.html>


More information about the nginx mailing list