Masking IP-Addresses on logging?
tobia.conforto at gmail.com
Fri Feb 19 01:27:01 MSK 2010
Nick Pearson wrote:
> It's humorous when politicians try to regulate the openness of the Internet.
Humorous wouldn't be my choice of words.
> You can change the log format with something like this:
> log_format no_ip '0.0.0.0 - no-user [$time_local] '
No. This will break most log-based analytics software, which use the source address to tell between different visitors, in order to generate 'visit', 'page', and 'hit' statistics.
I guess the only feasible way to have some kind of plausible denial in court *and* to get your statistics is to use the result of a one-way hash function as a fake IP address.
MD4 and MD5 would do a fine job, but they are probably way too heavy to use on a web server. CRC-32 is lighter and outputs a 32bit code, which could be reformatted as a fake IP address.
I suggest writing a custom module that will compute the CRC-32 of "$remote_address$http_user_agent" (just to get some variance over the IP address alone), then format it in the dotted-decimal form of IP addresses and make it available as $fake_remote_address.
This won't keep you from reversing the code. No function will, in this case, not even MD5, because you could always brute-force your way through, as you only have 2^32 plain texts. If you used the User-agent as an input to the hash function and *not* store it in the log, then it would get slightly better... But I guess CRC-32 will be enough for any lawyer :-)
More information about the nginx