how to separate robot access log and human access log
meteor8488
nginx-forum at nginx.us
Mon Apr 27 23:45:17 UTC 2015
Hi all,
I'm trying to separate the robot access log and human access log, so I'm
using below configuration:
http {
....
map $http_user_agent $ifbot {
default 0;
"~*rogerbot" 3;
"~*ChinasoSpider" 3;
"~*Yahoo" 1;
"~*Bot" 1;
"~*Spider" 1;
"~*archive" 1;
"~*search" 1;
"~*Yahoo" 1;
"~Mediapartners-Google" 1;
"~*bingbot" 1;
"~*YandexBot" 1;
"~*Feedly" 2;
"~*Superfeedr" 2;
"~*QuiteRSS" 2;
"~*g2reader" 2;
"~*Digg" 2;
"~*trendiction" 3;
"~*AhrefsBot" 3;
"~*curl" 3;
"~*Ruby" 3;
"~*Player" 3;
"~*Go\ http\ package" 3;
"~*Lynx" 3;
"~*Sleuth" 3;
"~*Python" 3;
"~*Wget" 3;
"~*perl" 3;
"~*httrack" 3;
"~*JikeSpider" 3;
"~*PHP" 3;
"~*WebIndex" 3;
"~*magpie-crawler" 3;
"~*JUC" 3;
"~*Scrapy" 3;
"~*libfetch" 3;
"~*WinHTTrack" 3;
"~*htmlparser" 3;
"~*urllib" 3;
"~*Zeus" 3;
"~*scan" 3;
"~*Indy\ Library" 3;
"~*libwww-perl" 3;
"~*GetRight" 3;
"~*GetWeb!" 3;
"~*Go!Zilla" 3;
"~*Go-Ahead-Got-It" 3;
"~*Download\ Demon" 3;
"~*TurnitinBot" 3;
"~*WebscanSpider" 3;
"~*WebBench" 3;
"~*YisouSpider" 3;
"~*check_http" 3;
"~*webmeup-crawler" 3;
"~*omgili" 3;
"~*blah" 3;
"~*fountainfo" 3;
"~*MicroMessenger" 3;
"~*QQDownload" 3;
"~*shoulu.jike.com" 3;
"~*omgilibot" 3;
"~*pyspider" 3;
}
....
}
And in server part, I'm using:
if ($ifbot = "1") {
set $spiderbot 1;
}
if ($ifbot = "2") {
set $rssbot 1;
}
if ($ifbot = "3") {
return 403;
access_log /web/log/badbot.log main;
}
access_log /web/log/location_access.log main;
access_log /web/log/spider_access.log main if=$spiderbot;
access_log /web/log/rssbot_access.log main if=$rssbot;
But it seems that nginx still writes some robot logs in to both
location_access.log and spider_access.log.
How can I separate the logs for the robot?
And another questions is that some robot logs are not written to
spider_access.log but exist in location_access.log. It seems that my map is
not working. Is anything wrong when I define "map"?
Posted at Nginx Forum: http://forum.nginx.org/read.php?2,258417,258417#msg-258417
More information about the nginx
mailing list