URL encoding issue

Maxim Dounin mdounin at mdounin.ru
Fri Apr 16 04:42:54 MSD 2010


Hello!

On Thu, Apr 15, 2010 at 04:13:10PM +0530, Harish Sundararaj wrote:

> I have a problem with nginx's urlencoding when it does a rewrite and passes
> it on to a proxy backend. Below is the config:
> 
> location ~ ^/search/([^/]+)/?([^-/&]+)?.*$ {
>   set $query $1;
>   set $filter $2;
> 
>   rewrite ^ /search?q=$query&cat=$filter&$args? last;
> }
>
> location ~ ^/(search) {
>
>                 proxy_read_timeout 180;
>                 proxy_set_header X-Real-IP $remote_addr;
>                 proxy_pass http://load_balance;
> }

[...]

> Now the problem is if the query has special characters like & , ", ' etc...
> they are not urlencoded before rewriting:
> 
> Eg:
> 
> /search/jhonson&jhonson  becomes  /search?q=jhonson&jhonson    but what I
> expect is: /search?q=jhonson&26jhonson
> 
> /search/"tourism in paris" becomes /search?q=\"tourism%20in%20paris\"  but
> what I expect is /search?q=%22tourism%20in%20paris%22  [This is weird spaces
> are urlencoded properly]
> 
> Even if i try to urlencode it from the frontend(Javascript) like:
> 
> /search/jhonson&26jhonson still becomes /search?q=jhonson&jhonson  which
> again looks weird to me !
> 
> Am i missing anything? Is there anything that i can do to get what i expect
> ?

Some background: nginx do location matching on unescaped URI path.  
So you have unescaped data in your variables.

And there are two separate issues here:

1. In rewrites nginx do not escape data substituted from 
variables.  It's your responsibility to escape supplied data 
correctly.  This is on purpose - one should be able to add many 
arguments at once, e.g.:

    set $x "a=1&b=2&c=3";
    rewrite ^ /something?$x;

The only exception is enumerated captures extracted from the uri 
during the rewrite in question (in 0.8.35 it even escapes "&").

It is believed that correct solution would be to implement some 
urlencode/urldecode functions, but there is no consensus on 
desired syntax yet.  There are patches for $urlencode_* / 
$urldecode_* variables by Kirill Korinskiy floating around, but 
they were explicitly rejected by Igor.

2. When doing proxy_pass nginx do escape characters which aren't 
valid in URI, but it doesn't to escape some chars which aren't 
(like "<", ">", <">).  That's why you see space escaped, but not 
<">.

Configuration which should resolve most of your problems (at least 
in 0.8.35) is:

    location /search {
        rewrite  ^/search/([^/]+)/?([^-/&]+)?
                  /search?q=$1&cat=$2  break;

        proxy_pass http://load_balance;
        ...
    }

This won't fix '"' though.  But it is believed that this shouldn't 
lead to severe problems (please report back if it is, this will 
help me to persuade Igor to apply patch which fixes it).

Maxim Dounin



More information about the nginx mailing list