GeoIP Module

Nam nginx-forum at nginx.us
Wed Dec 1 21:23:17 MSK 2010


Igor Sysoev Wrote:
-------------------------------------------------------
> On Wed, Dec 01, 2010 at 05:05:56AM -0500, Nam
> wrote:
> 
> > Hey Guys, I have run into a problem with the geo
> module. I have set up a
> > geo list containing a LARGE list of IPs which we
> need to have
> > "whitelisted" for getting through to the
> upstream. These IPs are for
> > search engines. Currently we have the list set
> up via the following
> > way...
> > 
> > geo $remote_addr $search  {
> >         default          0;
> >         include          geoip-search.conf;
> > }
> > 
> > The geoip-search.conf file contains a the list
> of IPs in the following
> > format...
> > 
> > 114.111.36.26/32  search;
> > 114.111.36.28/32  search;
> > 114.111.36.29/32  search;
> > 114.111.36.30/32  search;
> > 114.111.36.31/32  search;
> > 114.111.36.32/32  search;
> > 119.63.193.100/32  search;
> > 119.63.193.101/32  search;
> > 119.63.193.102/32  search;
> > 119.63.193.103/32  search;
> > 
> > Then inside of the configurations, we do the
> following... which was
> > based on recommendations from Igor...
> > 
> > if ( $search = search ) {
> >             proxy_pass             
> http://LB_HTTP_UPSTREAM;
> >             break;
> > }
> > 
> > Then under that we also have some stuff for
> security which checks for a
> > cookie and stuff serving them a different page
> if no cookie is present.
> > We want the search engine IPs to be able to make
> it through to the
> > upstream, but it appears that this is no longer
> occurring. We had no
> > problems in the past... Perhaps it is due to
> something in 0.8.53 as we
> > had upgraded to that a while ago, and then after
> a while we got
> > complaints of google bots not getting through.
> Our list contains about
> > 40,000 lines which covers well over 100,000 IPs.
> Anyone have any ideas
> > on what could be causing this?
> 
> It should work. Could you create debug log of the
> request ?

That may be a bit difficult... do you need to see the debug log from the
requests NOT getting through, or just any request? Our servers are
currently pushing well over 150mbps of traffic right now, and we cannot
put it into debug mode and start messing around, but we can test it out
on our test server, and get just a single request worth of debug log
data.

> BTW, you may compress geo file using this script:
> 
> ------
> #!/usr/bin/perl -w
> 
> use Net::CIDR::Lite;
> use strict;
> use warnings;
> 
> my %cidr;
> 
> while (<>) {
>     if (/^(\S+)\s+(\S+);/) {
>         my($net, $region) = ($1, $2);
>         if (!defined $cidr{$region}) {
>             $cidr{$region} = Net::CIDR::Lite->new;
>         }
>         $cidr{$region}->add($net);
>     }
> }
> 
> for my $region (sort { $a cmp $b } keys %cidr) {
>     print((join " $region;\n",
> $cidr{$region}->list), " $region;\n");
> }
> ------
> 
> For example, the 10 above lines are compressed to
> just 4:
> ------
> 114.111.36.26/32 search;
> 114.111.36.28/30 search;
> 114.111.36.32/32 search;
> 119.63.193.100/30 search;
> ------

Awesome, we will have to use that to run through our large list. We have
a lot of IPs in that list, so this script would be handy. Thanks Igor.

> 
> Also, if you use an original client $remote_addr,
> then this
> 
> -geo $remote_addr $search  {
> +geo $search  {
>          default          0;
>          include          geoip-search.conf;
> }
> 
> will work slightly faster.

Sounds good, we will implement that as well.

> 
> Also, you may avoid "if":
> 
> geo $search {
>     default  usual_upstream;
>     ...      search_upstream;
>     ...      search_upstream;
>     ...      search_upstream;
>     ...      search_upstream;
>     ...
> }
> 
> upstream search_upstream {
>     ...
> }
> 
> upstream usual_upstream {
>     ...
> }
> 
> server {
>     location / {
>         proxy_pass  http://$search;
>     }
> 

This is a very nice idea/feature, but it will not work in our case
unfortunately because we use this list across many sites we host. Some
sites have additional security features in place which needs to always
be bypassed for search engine crawlers and our monitoring systems. In
configs which use security features, we include if statements to ensure
that they get a proxy_pass to that configs upstream.

> 
> -- 
> Igor Sysoev
> http://sysoev.ru/en/
> 
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://nginx.org/mailman/listinfo/nginx

Posted at Nginx Forum: http://forum.nginx.org/read.php?2,2228,154830#msg-154830




More information about the nginx mailing list