preventing rewrite loops with "index"

Dennis J. dennisml at conversis.de
Mon Jan 25 15:58:21 MSK 2010


On 01/25/2010 11:57 AM, Maxim Dounin wrote:
> Hello!
>
> On Mon, Jan 25, 2010 at 08:58:18AM +0200, Marcus Clyne wrote:
>
>> Hi,
>>
>> Maxim Dounin wrote:
>>> Hello!
>>>
>>> On Sun, Jan 24, 2010 at 10:45:32PM +0100, Piotr Sikora wrote:
>>>
>>>>> 2. Note "internal" in location /users/.  It means "only visible
>>>>> for internal redirects", so even user called "users" should be
>>>>> correctly processed by the first location.
>>>> Actually, this isn't true. Any attempt to access internal location
>>>> results in 404 response.
>>>>
>>>> You can verify this with very simple configuration:
>>>>
>>>> server {
>>>>    listen 8000;
>>>>    location / { return 500; }
>>>>    location /x { internal; return 500; }
>>>> }
>>>>
>>>> Accessing /x will result in 404 response.
>> The example is obviously correct, but it doesn't truly explain the
>> reason for getting the 404 for accessing /users/xxx URLs (even
>> though the result is almost the same).  The reason is to do with the
>> order that locations are handled, specifically that ^~ locations are
>> handled before ~* and ~ ones, and if they match, then the regex ones
>> aren't tested.  If you try to access the URL /users/xxx, it will
>> therefore match the second location given by ^~, and return 404
>> because it's an internal location.  Therefore, trying access
>> anything under a user named 'users' will fail (though the URL /users
>> on its own is ok, because that will match the regex location and not
>> the ^~ location).
>
> It's somewhat obvious.
>
>>
>> Using location /users in the original locations will result in an
>> internal server error, because the regex will be caught before the
>> /users location each time the URL is checked, creating an infinite
>> loop.
>
> By "original" you mean config I'm suggested to Dennis J?  No, as
> first rewrite will add '/' to it, and on next iteration it will be
> caught by /users/.
>
> The problem will arise with directory redirects though
> (/username/dir ->  /username/dir/), as they will use paths after
> rewrites, and this isn't what we need here.  When user has dir in
> it's htdocs - wee need redirect "/user/dir" ->  "/user/dir/", but
> the config will issue "/users/u/s/e/users/dir/" one.
>
>> From the above I think that using alias will be better.  In
> 0.8.* this may be done with named captures and nested locations,
> like this:
>
>     location ~* ^/(?<name>(?<n1>[a-z])(?<n2>[a-z0-9])(?<n3>[a-z0-9])[^/]*)(?<p>/.*)?$ {
>         alias /tmp/users/$n1/$n2/$n3/$name/htdocs$p;
>
>         location ~ \.(mpg|zip|avi)$ {
>             valid_referers localhost none blocked;
>             if ($invalid_referer) {
>                 return 403;
>             }
>         }
>     }

I was trying something like that in 0.7 but couldn't get around the 
captured var smashing problem.

The approach you posted works fine but I feel a bit uncomfortable because 
it feels a bit like a hack. One modification I tried was using something 
like "/__user/" as second location and then explicitly doing a rewrite in 
the first external location to essentially pass on the execution of the 
query to the second location block. Think of it as an emulated "goto". The 
advantage would have been that the declaration of the config is more 
explicit and would not need to rely on features such as "^~" and "internal" 
to do some magic.
Unfortunately that fails once "index" enters the picture and I end up with 
requests to "../u/s/e/users...".

> In older versions one have to create separate locations for normal
> files and ones which need special processing, e.g.
>
>     location ~* ^/(([a-z])([a-z0-9])([a-z0-9])[^/]*)(/.*\.(mpg|zip|avi))?$ {
>         alias /tmp/users/$2/$3/$4/$1/htdocs$5;
>         valid_referers localhost none blocked;
>         if ($invalid_referer) {
>             return 403;
>         }
>     }
>
>     location ~* ^/(([a-z])([a-z0-9])([a-z0-9])[^/]*)(/.*)?$ {
>         alias /tmp/users/$2/$3/$4/$1/htdocs$5;
>     }
>

This splitting up of the configuration is something I'm trying to avoid 
since I still need to add other checks. One is a check whether a special 
file exists and if it does deny the user access. In your initial "/users/" 
config I can put that once in the first location block but here I would 
have to duplicate that in both. I wouldn't be terrible but I doesn't look 
as clear, oncise and straightforward as the 0.8 example you mentioned above.
The other thing I still have to add is the handling of *.php files. What is 
special here is that I have to check another special file in the user 
directory to see which php-upstream I need to pass things to (there are two 
different ones). What I'm planning to do is to write a little module that 
parses the special file and sets a variable according to its contents. Then 
I'll use something like "fastcgi_pass $php_upstream" to pass the request 
the appropriate upstream servers.

I think I'll give 0.8 and your config from above a try as that seems to be 
the cleanest way to handle this.

Regards,
   Dennis



More information about the nginx mailing list