GeoIPCity with nginx
Maxim Dounin
mdounin at mdounin.ru
Sun Nov 23 19:37:23 MSK 2008
Hello!
On Sun, Nov 23, 2008 at 02:12:25PM +0300, Igor Sysoev wrote:
> On Sun, Nov 23, 2008 at 12:05:48AM +0300, Maxim Dounin wrote:
>
> > On Sat, Nov 22, 2008 at 05:18:31PM +0100, Bobby Dr wrote:
> >
> > > I know Maxmind's GeoIP Country database can be used easily with nginx.
> > > But what about their Geo-City database?
> > >
> > > The default CSV database stands at > 100MB in size (and will grow even
> > > larger if the two normalized files are merged together). For this
> > > reason, using the CIDR format may not be feasible (due to excessive
> > > memory requirement)
> > >
> > > The binary file is much smaller however.
> >
> > The problem with maxmind's city database afaik is that text
> > information they provide aren't cidrs, but ip ranges. This is
> > generaly good for relation databases, but worst case for those who
> > are able to work with cidrs.
> >
> > Binary file afaik is radix tree dump with real cidrs, that's why it's
> > much smaller.
> >
> > Theoretically it should be possible to collapse ip ranges to
> > optimal set of cidr's to make this usable with native nginx geo
> > module, but this isn't really easy task.
>
> No, sinlge IP allocations may be equal to several CIDRs,
> for example, some time ago I saw this IP range:
>
> inetnum: 94.25.31.248 - 94.25.43.251
>
> that is equal to 10 CIDRs:
>
> 94.25.31.248/29
> 94.25.32.0/21
> 94.25.40.0/23
> 94.25.42.0/24
> 94.25.43.0/25
> 94.25.43.128/26
> 94.25.43.192/27
> 94.25.43.224/28
> 94.25.43.240/29
> 94.25.43.248/30
>
> And if you convert Maxmind GeoCity base file
> GeoLiteCity_20081101/GeoLiteCity-Blocks.csv that has 3014818 ip ranges
> you will get 4125519 CIDR - one third more.
>
> The increase is due to IP allocations as I have showed above
> and due to Maxmind errors - they may split single CIDR to 3 ranges as:
>
> 10.0.0.1-10.0.0.1
> 10.0.0.2-10.0.0.254
> 10.0.0.254-10.0.0.255
This is somewhat obvious. I'm talking about situation where from
small number of CIDRs multiple non-overlapping ranges are produced, e.g.
10.0.0.0/8 1;
10.255.255.127/32 2;
will result in the following ranges:
10.0.0.0-10.255.255.126 1;
10.255.255.127-10.255.255.127 2;
10.255.255.128-10.255.255.255 1;
and this in turn will result in huge number of CIDRs.
> > > Has anybody been able to use the geo-city database with nginx? For
> > > apache MaxMind provides mod_geoip which works on the binary file, making
> > > it very fast.
> > >
> > > Does anyone have any solution (like mod_geoip) for nginx? I'm using PECL
> > > geoip for PHP and the one for ruby. But I feel, geo lookup at the server
> > > level would be much faster.
>
> Last week I have speeded up loading huge geo base (like Maxmind's one),
> it will be in 0.7.23. However, the memory footprint is large: Maxmind base
> takes about 250M on i386 (fortunately, the memory is shared between master
> and workers on VM copy-on-write basis).
Sounds good anyway. :)
> Yesterday I investigated using ranges instead of CIDR, the in memory base
> will take about 25M as Maxmind's one. However, the memory footprint in top
> will be the same as modern malloc()s in FreeBSD and probably Linux lazy
> frees memory using madvise(MADV_FREE) and nginx uses a lot of memory
> while handling the base on reconfigiration.
>
> The search should be as fast as simple radix tree, may be even faster:
> the simple radix tree goes through short loop, but it causes tens of TLB
> and cache misses, while searching suitable range goes through longer loop,
> but it causes only several TLB and cache misses.
This should be an intresting alternative for range-centric bases.
> The only unhandy thing with ranges is range overriding to correct
> external base errors. For example, to correct
>
> 10.0.0.1-10.0.0.1 1;
> 10.0.0.2-10.0.0.254 2;
> 10.0.0.254-10.0.0.255 1;
>
> something like this should be used:
>
> 10.0.0.1-10.0.0.1 delete;
> 10.0.0.2-10.0.0.254 delete;
> 10.0.0.254-10.0.0.255 delete;
> 10.0.0.1-10.0.0.255 1;
As far as I understand, with CIDRs one anyway have to define identical
CIDR to override erroneous one, no? What's wrong with same
aproach applied to ranges?
E.g.
10.0.0.2-10.0.0.254 1;
should be enough to correct error in the above example, as in
10.0.0.0/24 2;
10.0.0.1/32 1;
10.0.0.254/31 1;
it's enough to add
10.0.0.0/24 1;
The only problem with ranges I see is that if somebody will add
something like
10.0.0.2-10.0.0.2 1;
to original database, he will probably also modify
10.0.0.2-10.0.0.254 to be 10.0.0.3-10.0.0.254, and private
modifications will in turn require modifications.
Maxim Dounin
More information about the nginx
mailing list