Having issues with nginx / root captures (0.7.53)

Igor Sysoev is at rambler-co.ru
Fri May 1 10:41:18 MSD 2009


On Thu, Apr 30, 2009 at 11:16:42PM -0700, Michael Shadle wrote:

> sorry, that was supposed to be bar.com - i just messed up substituting
> 
> 2009/4/30 Igor Sysoev <is at rambler-co.ru>:
> 
> > First, "~^foo(.*?)\.bar\.ssgisp\.com$" will never match "foo2.mike.bar.com".
> > Second, "~^foo(.*?)\.bar\.com$" will capture "2.mike" with "foo2.mike.bar.com".
> 
> You're right though, something in the files is messing with my matching.
> 
> What is it in this file that is setting up some sort of capture?
> 
> For some reason this turns a
> 
> foo123.mike.bar.com into /home/mike/web/foo, not foo123
> 
> does -any- regular expression mess with the regexps
> 
> location ^~ /robots.txt {
>         auth_basic off;
>         root /etc/nginx/robots;
>         break;

You do not "break" here. This is waste of CPU cycles.

> }
> 
> if ($http_user_agent ~* googlebot) {
>         return 404;
>         break;
> }
> 
> if ($http_user_agent ~* looksmart) {
>         return 404;
>         break;
> }
> 
> if ($http_user_agent ~* crawl) {
>         return 404;
>         break;
> }
> 
> if ($http_user_agent ~* robot) {
>         return 404;
>         break;
> }
> 
> if ($http_user_agent ~* findlinks) {
>         return 404;
>         break;
> }
> 
> if ($http_user_agent ~* infoseek) {
>         return 404;
>         break;
> }
> 
> if ($http_user_agent ~* search) {
>         return 404;
>         break;
> }

The "break" after "return" costs nothing, but useless.
Also, it' better to combine all check in single regex - it will be run
much faster:

if ($http_user_agent ~* "googlebot|looksmart|...") {
       return 404;
}


-- 
Igor Sysoev
http://sysoev.ru/en/





More information about the nginx mailing list