GeoIPCity with nginx
mdounin at mdounin.ru
Sun Nov 23 19:37:23 MSK 2008
On Sun, Nov 23, 2008 at 02:12:25PM +0300, Igor Sysoev wrote:
> On Sun, Nov 23, 2008 at 12:05:48AM +0300, Maxim Dounin wrote:
> > On Sat, Nov 22, 2008 at 05:18:31PM +0100, Bobby Dr wrote:
> > > I know Maxmind's GeoIP Country database can be used easily with nginx.
> > > But what about their Geo-City database?
> > >
> > > The default CSV database stands at > 100MB in size (and will grow even
> > > larger if the two normalized files are merged together). For this
> > > reason, using the CIDR format may not be feasible (due to excessive
> > > memory requirement)
> > >
> > > The binary file is much smaller however.
> > The problem with maxmind's city database afaik is that text
> > information they provide aren't cidrs, but ip ranges. This is
> > generaly good for relation databases, but worst case for those who
> > are able to work with cidrs.
> > Binary file afaik is radix tree dump with real cidrs, that's why it's
> > much smaller.
> > Theoretically it should be possible to collapse ip ranges to
> > optimal set of cidr's to make this usable with native nginx geo
> > module, but this isn't really easy task.
> No, sinlge IP allocations may be equal to several CIDRs,
> for example, some time ago I saw this IP range:
> inetnum: 18.104.22.168 - 22.214.171.124
> that is equal to 10 CIDRs:
> And if you convert Maxmind GeoCity base file
> GeoLiteCity_20081101/GeoLiteCity-Blocks.csv that has 3014818 ip ranges
> you will get 4125519 CIDR - one third more.
> The increase is due to IP allocations as I have showed above
> and due to Maxmind errors - they may split single CIDR to 3 ranges as:
This is somewhat obvious. I'm talking about situation where from
small number of CIDRs multiple non-overlapping ranges are produced, e.g.
will result in the following ranges:
and this in turn will result in huge number of CIDRs.
> > > Has anybody been able to use the geo-city database with nginx? For
> > > apache MaxMind provides mod_geoip which works on the binary file, making
> > > it very fast.
> > >
> > > Does anyone have any solution (like mod_geoip) for nginx? I'm using PECL
> > > geoip for PHP and the one for ruby. But I feel, geo lookup at the server
> > > level would be much faster.
> Last week I have speeded up loading huge geo base (like Maxmind's one),
> it will be in 0.7.23. However, the memory footprint is large: Maxmind base
> takes about 250M on i386 (fortunately, the memory is shared between master
> and workers on VM copy-on-write basis).
Sounds good anyway. :)
> Yesterday I investigated using ranges instead of CIDR, the in memory base
> will take about 25M as Maxmind's one. However, the memory footprint in top
> will be the same as modern malloc()s in FreeBSD and probably Linux lazy
> frees memory using madvise(MADV_FREE) and nginx uses a lot of memory
> while handling the base on reconfigiration.
> The search should be as fast as simple radix tree, may be even faster:
> the simple radix tree goes through short loop, but it causes tens of TLB
> and cache misses, while searching suitable range goes through longer loop,
> but it causes only several TLB and cache misses.
This should be an intresting alternative for range-centric bases.
> The only unhandy thing with ranges is range overriding to correct
> external base errors. For example, to correct
> 10.0.0.1-10.0.0.1 1;
> 10.0.0.2-10.0.0.254 2;
> 10.0.0.254-10.0.0.255 1;
> something like this should be used:
> 10.0.0.1-10.0.0.1 delete;
> 10.0.0.2-10.0.0.254 delete;
> 10.0.0.254-10.0.0.255 delete;
> 10.0.0.1-10.0.0.255 1;
As far as I understand, with CIDRs one anyway have to define identical
CIDR to override erroneous one, no? What's wrong with same
aproach applied to ranges?
should be enough to correct error in the above example, as in
it's enough to add
The only problem with ranges I see is that if somebody will add
to original database, he will probably also modify
10.0.0.2-10.0.0.254 to be 10.0.0.3-10.0.0.254, and private
modifications will in turn require modifications.
More information about the nginx