GeoIPCity with nginx

Maxim Dounin mdounin at mdounin.ru
Sun Nov 23 19:37:23 MSK 2008


Hello!

On Sun, Nov 23, 2008 at 02:12:25PM +0300, Igor Sysoev wrote:

> On Sun, Nov 23, 2008 at 12:05:48AM +0300, Maxim Dounin wrote:
> 
> > On Sat, Nov 22, 2008 at 05:18:31PM +0100, Bobby Dr wrote:
> > 
> > > I know Maxmind's GeoIP Country database can be used easily with nginx.
> > > But what about their Geo-City database?
> > > 
> > > The default CSV database stands at > 100MB in size (and will grow even
> > > larger if the two normalized files are merged together). For this
> > > reason, using the CIDR format may not be feasible (due to excessive
> > > memory requirement)
> > > 
> > > The binary file is much smaller however.
> > 
> > The problem with maxmind's city database afaik is that text 
> > information they provide aren't cidrs, but ip ranges.  This is 
> > generaly good for relation databases, but worst case for those who 
> > are able to work with cidrs.
> > 
> > Binary file afaik is radix tree dump with real cidrs, that's why it's 
> > much smaller.
> >
> > Theoretically it should be possible to collapse ip ranges to 
> > optimal set of cidr's to make this usable with native nginx geo 
> > module, but this isn't really easy task.
> 
> No, sinlge IP allocations may be equal to several CIDRs,
> for example, some time ago I saw this IP range:
> 
> inetnum:        94.25.31.248 - 94.25.43.251
> 
> that is equal to 10 CIDRs:
> 
> 94.25.31.248/29
> 94.25.32.0/21
> 94.25.40.0/23
> 94.25.42.0/24
> 94.25.43.0/25
> 94.25.43.128/26
> 94.25.43.192/27
> 94.25.43.224/28
> 94.25.43.240/29
> 94.25.43.248/30
> 
> And if you convert Maxmind GeoCity base file
> GeoLiteCity_20081101/GeoLiteCity-Blocks.csv that has 3014818 ip ranges
> you will get 4125519 CIDR - one third more.
> 
> The increase is due to IP allocations as I have showed above
> and due to Maxmind errors - they may split single CIDR to 3 ranges as:
> 
> 10.0.0.1-10.0.0.1
> 10.0.0.2-10.0.0.254
> 10.0.0.254-10.0.0.255

This is somewhat obvious.  I'm talking about situation where from 
small number of CIDRs multiple non-overlapping ranges are produced, e.g.

10.0.0.0/8         1;
10.255.255.127/32  2;

will result in the following ranges:

10.0.0.0-10.255.255.126        1;
10.255.255.127-10.255.255.127  2;
10.255.255.128-10.255.255.255  1;

and this in turn will result in huge number of CIDRs.

> > > Has anybody been able to use the geo-city database with nginx? For
> > > apache MaxMind provides mod_geoip which works on the binary file, making
> > > it very fast.
> > > 
> > > Does anyone have any solution (like mod_geoip) for nginx? I'm using PECL
> > > geoip for PHP and the one for ruby. But I feel, geo lookup at the server
> > > level would be much faster.
> 
> Last week I have speeded up loading huge geo base (like Maxmind's one),
> it will be in 0.7.23. However, the memory footprint is large: Maxmind base
> takes about 250M on i386 (fortunately, the memory is shared between master
> and workers on VM copy-on-write basis).

Sounds good anyway. :)

> Yesterday I investigated using ranges instead of CIDR, the in memory base
> will take about 25M as Maxmind's one. However, the memory footprint in top
> will be the same as modern malloc()s in FreeBSD and probably Linux lazy
> frees memory using madvise(MADV_FREE) and nginx uses a lot of memory
> while handling the base on reconfigiration.
> 
> The search should be as fast as simple radix tree, may be even faster:
> the simple radix tree goes through short loop, but it causes tens of TLB
> and cache misses, while searching suitable range goes through longer loop,
> but it causes only several TLB and cache misses.

This should be an intresting alternative for range-centric bases.  

> The only unhandy thing with ranges is range overriding to correct
> external base errors. For example, to correct
> 
> 10.0.0.1-10.0.0.1	1;
> 10.0.0.2-10.0.0.254	2;
> 10.0.0.254-10.0.0.255	1;
> 
> something like this should be used:
> 
> 10.0.0.1-10.0.0.1	delete;
> 10.0.0.2-10.0.0.254	delete;
> 10.0.0.254-10.0.0.255	delete;
> 10.0.0.1-10.0.0.255	1;

As far as I understand, with CIDRs one anyway have to define identical 
CIDR to override erroneous one, no?  What's wrong with same 
aproach applied to ranges?

E.g.

10.0.0.2-10.0.0.254   1;

should be enough to correct error in the above example, as in 

10.0.0.0/24   2;
10.0.0.1/32   1;
10.0.0.254/31 1;

it's enough to add

10.0.0.0/24   1;

The only problem with ranges I see is that if somebody will add 
something like

10.0.0.2-10.0.0.2      1;

to original database, he will probably also modify 
10.0.0.2-10.0.0.254 to be 10.0.0.3-10.0.0.254, and private 
modifications will in turn require modifications.

Maxim Dounin





More information about the nginx mailing list