how to separate robot access log and human access log

meteor8488 nginx-forum at nginx.us
Mon Apr 27 23:45:17 UTC 2015


Hi all,

I'm trying to separate the robot access log and human access log, so I'm
using below configuration:

  http {
....
    map $http_user_agent $ifbot {
        default 0;
        "~*rogerbot"        3;
        "~*ChinasoSpider"       3;
        "~*Yahoo"           1;
        "~*Bot"         1;
        "~*Spider"          1;
        "~*archive"         1;
        "~*search"          1;
        "~*Yahoo"           1;
        "~Mediapartners-Google" 1;
        "~*bingbot"         1;
        "~*YandexBot"           1;
        "~*Feedly"  2;
        "~*Superfeedr"  2;
        "~*QuiteRSS"    2;
        "~*g2reader"    2;
        "~*Digg"    2;
        "~*trendiction"     3;
        "~*AhrefsBot"           3;
        "~*curl"            3;
        "~*Ruby"            3;
        "~*Player"          3;
        "~*Go\ http\ package"   3;
        "~*Lynx"            3;
        "~*Sleuth"          3;
        "~*Python"          3;
        "~*Wget"            3;
        "~*perl"            3;
        "~*httrack"         3;
        "~*JikeSpider"          3;
        "~*PHP"         3;
        "~*WebIndex"            3;
        "~*magpie-crawler"      3;
        "~*JUC"         3;
        "~*Scrapy"          3;
        "~*libfetch"            3;
        "~*WinHTTrack"      3;
        "~*htmlparser"      3;
        "~*urllib"          3;
        "~*Zeus"            3;
        "~*scan"            3;
        "~*Indy\ Library"       3;
        "~*libwww-perl"     3;
        "~*GetRight"            3;
        "~*GetWeb!"         3;
        "~*Go!Zilla"            3;
        "~*Go-Ahead-Got-It"     3;
        "~*Download\ Demon" 3;
        "~*TurnitinBot"     3;
        "~*WebscanSpider"       3;
        "~*WebBench"        3;
        "~*YisouSpider"     3;
        "~*check_http"      3;
        "~*webmeup-crawler"     3;
        "~*omgili"      3;
        "~*blah"        3;
        "~*fountainfo"      3;
        "~*MicroMessenger"      3;
        "~*QQDownload"      3;
        "~*shoulu.jike.com"     3;
        "~*omgilibot"       3;
        "~*pyspider"        3;
    }
....
}



And in server part, I'm using:

    if ($ifbot = "1") {
    set $spiderbot 1;
}
if ($ifbot = "2") {
    set $rssbot 1;
}
if ($ifbot = "3") {
    return 403;
    access_log /web/log/badbot.log  main;
}

access_log /web/log/location_access.log  main;
access_log /web/log/spider_access.log main if=$spiderbot;
access_log /web/log/rssbot_access.log main if=$rssbot;


But it seems that nginx still writes some robot logs in to both
location_access.log and spider_access.log.

How can I separate the logs for the robot?

And another questions is that some robot logs are not written to
spider_access.log but exist in location_access.log. It seems that my map is
not working. Is anything wrong when I define "map"?

Posted at Nginx Forum: http://forum.nginx.org/read.php?2,258417,258417#msg-258417



More information about the nginx mailing list