Masking IP-Addresses on logging?

Tobia Conforto tobia.conforto at gmail.com
Fri Feb 19 01:27:01 MSK 2010


Nick Pearson wrote:
> It's humorous when politicians try to regulate the openness of the Internet.

Humorous wouldn't be my choice of words.

> You can change the log format with something like this:
> 
> log_format no_ip '0.0.0.0 - no-user [$time_local] '

No. This will break most log-based analytics software, which use the source address to tell between different visitors, in order to generate 'visit', 'page', and 'hit' statistics.

I guess the only feasible way to have some kind of plausible denial in court *and* to get your statistics is to use the result of a one-way hash function as a fake IP address.

MD4 and MD5 would do a fine job, but they are probably way too heavy to use on a web server. CRC-32 is lighter and outputs a 32bit code, which could be reformatted as a fake IP address.

I suggest writing a custom module that will compute the CRC-32 of "$remote_address$http_user_agent" (just to get some variance over the IP address alone), then format it in the dotted-decimal form of IP addresses and make it available as $fake_remote_address.

This won't keep you from reversing the code. No function will, in this case, not even MD5, because you could always brute-force your way through, as you only have 2^32 plain texts. If you used the User-agent as an input to the hash function and *not* store it in the log, then it would get slightly better... But I guess CRC-32 will be enough for any lawyer :-)

Tobia


More information about the nginx mailing list