GeoIPCity with nginx

Igor Sysoev is at
Sun Nov 23 14:12:25 MSK 2008

On Sun, Nov 23, 2008 at 12:05:48AM +0300, Maxim Dounin wrote:

> On Sat, Nov 22, 2008 at 05:18:31PM +0100, Bobby Dr wrote:
> > I know Maxmind's GeoIP Country database can be used easily with nginx.
> > But what about their Geo-City database?
> > 
> > The default CSV database stands at > 100MB in size (and will grow even
> > larger if the two normalized files are merged together). For this
> > reason, using the CIDR format may not be feasible (due to excessive
> > memory requirement)
> > 
> > The binary file is much smaller however.
> The problem with maxmind's city database afaik is that text 
> information they provide aren't cidrs, but ip ranges.  This is 
> generaly good for relation databases, but worst case for those who 
> are able to work with cidrs.
> Binary file afaik is radix tree dump with real cidrs, that's why it's 
> much smaller.
> Theoretically it should be possible to collapse ip ranges to 
> optimal set of cidr's to make this usable with native nginx geo 
> module, but this isn't really easy task.

No, sinlge IP allocations may be equal to several CIDRs,
for example, some time ago I saw this IP range:

inetnum: -

that is equal to 10 CIDRs:

And if you convert Maxmind GeoCity base file
GeoLiteCity_20081101/GeoLiteCity-Blocks.csv that has 3014818 ip ranges
you will get 4125519 CIDR - one third more.

The increase is due to IP allocations as I have showed above
and due to Maxmind errors - they may split single CIDR to 3 ranges as:

> > Has anybody been able to use the geo-city database with nginx? For
> > apache MaxMind provides mod_geoip which works on the binary file, making
> > it very fast.
> > 
> > Does anyone have any solution (like mod_geoip) for nginx? I'm using PECL
> > geoip for PHP and the one for ruby. But I feel, geo lookup at the server
> > level would be much faster.

Last week I have speeded up loading huge geo base (like Maxmind's one),
it will be in 0.7.23. However, the memory footprint is large: Maxmind base
takes about 250M on i386 (fortunately, the memory is shared between master
and workers on VM copy-on-write basis).

Yesterday I investigated using ranges instead of CIDR, the in memory base
will take about 25M as Maxmind's one. However, the memory footprint in top
will be the same as modern malloc()s in FreeBSD and probably Linux lazy
frees memory using madvise(MADV_FREE) and nginx uses a lot of memory
while handling the base on reconfigiration.

The search should be as fast as simple radix tree, may be even faster:
the simple radix tree goes through short loop, but it causes tens of TLB
and cache misses, while searching suitable range goes through longer loop,
but it causes only several TLB and cache misses.

The only unhandy thing with ranges is range overriding to correct
external base errors. For example, to correct	1;	2;	1;

something like this should be used:	delete;	delete;	delete;	1;

Igor Sysoev

More information about the nginx mailing list