Problem with rewrite last giving a HTTP 301 ?

Maxim Dounin mdounin at mdounin.ru
Wed Sep 22 06:13:23 MSD 2010


Hello!

On Tue, Sep 21, 2010 at 06:10:05PM -0400, toto2008 wrote:

> Hello,
> 
> I'm having a weird problem with my website.
> In my nginx conf, I have this rule:
> rewrite ^/robots\.txt$ /cms/robotstxt.php last;
> 
> and a location to handle PHP files:
> location ~ \.php$ {
>     fastcgi_pass   127.0.0.1:9000;
>     fastcgi_index  index.php;
>     fastcgi_param  SCRIPT_FILENAME  $document_root$fastcgi_script_name;
>     include fastcgi_params;
> }
> 
> When I visit robots.txt, I get the expected result (a robots.txt
> dynamically generated). If I check the nginx log file, as expected I get
> a 200 HTTP answer :
> IPADDRESS - - [21/Sep/2010:16:22:57 +0200] "GET /robots.txt HTTP/1.0"
> 200 231 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1;
> Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR
> 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; .NET4.0E)"
> If I check with fiddler (a HTTP debugger), everything is also OK : I
> have my robots.txt file with the 200 HTTP code.
> 
> By checking the logs most of the time the 200 code is here.
> 
> Now, here is the problem : sometimes, for some reasons that I don't
> know, instead of the 200 code, nginx send a 301 code. Here's an example
> of the googlebot visiting my website :
> 66.249.65.168 - - [21/Sep/2010:12:02:09 +0200] "GET /robots.txt
> HTTP/1.0" 301 178 "-" "Googlebot-Image/1.0"
> 66.249.65.166 - - [21/Sep/2010:12:02:09 +0200] "GET /cms/robotstxt.php
> HTTP/1.0" 200 231 "-" "Googlebot-Image/1.0"
> 
> Of course, I don't want bots or people to visit this /cms/robotstxt.php
> page directly...
> 
> Tonight, for the first time I've also found this problem with another
> rewrite rule. Again, I have a rule like :
> rewrite "^/([0-9]+)-([a-z0-9-]*)-([a-z]{2})$"
> "/cms/pages.php?id=$1;title=$2;language=$3" last; and normally I get a
> 200 HTTP code when I visit one of my page.
> But I discovered some weird logs from the Yandex bot :
> 95.108.151.244 - - [21/Sep/2010:21:20:56 +0200] "GET /123-mypage-fr
> HTTP/1.0" 301 178 "-" "Mozilla/5.0 (compatible; YandexBot/3.0;
> MirrorDetector; +http://yandex.com/bots)"
> 
> 
> I've also tried to fetch the robots.txt file from the google webmaster
> tools, but I got a correct 200 HTTP answer.
> 
> Is anybody know what the problem could be ? I have no idea how I could
> reproduce this strange results with my browser or wget or whatever
> tool.

Most likely it's your cms code which detects something in bot's 
requests (e.g. wrong Host header) and issues redirect.

Either look into it's code or try logging something like 
$upstream_http_location, it should give you some better idea 
what's going on.

Maxim Dounin



More information about the nginx mailing list