URL encoding issue
Maxim Dounin
mdounin at mdounin.ru
Fri Apr 16 04:42:54 MSD 2010
Hello!
On Thu, Apr 15, 2010 at 04:13:10PM +0530, Harish Sundararaj wrote:
> I have a problem with nginx's urlencoding when it does a rewrite and passes
> it on to a proxy backend. Below is the config:
>
> location ~ ^/search/([^/]+)/?([^-/&]+)?.*$ {
> set $query $1;
> set $filter $2;
>
> rewrite ^ /search?q=$query&cat=$filter&$args? last;
> }
>
> location ~ ^/(search) {
>
> proxy_read_timeout 180;
> proxy_set_header X-Real-IP $remote_addr;
> proxy_pass http://load_balance;
> }
[...]
> Now the problem is if the query has special characters like & , ", ' etc...
> they are not urlencoded before rewriting:
>
> Eg:
>
> /search/jhonson&jhonson becomes /search?q=jhonson&jhonson but what I
> expect is: /search?q=jhonson&26jhonson
>
> /search/"tourism in paris" becomes /search?q=\"tourism%20in%20paris\" but
> what I expect is /search?q=%22tourism%20in%20paris%22 [This is weird spaces
> are urlencoded properly]
>
> Even if i try to urlencode it from the frontend(Javascript) like:
>
> /search/jhonson&26jhonson still becomes /search?q=jhonson&jhonson which
> again looks weird to me !
>
> Am i missing anything? Is there anything that i can do to get what i expect
> ?
Some background: nginx do location matching on unescaped URI path.
So you have unescaped data in your variables.
And there are two separate issues here:
1. In rewrites nginx do not escape data substituted from
variables. It's your responsibility to escape supplied data
correctly. This is on purpose - one should be able to add many
arguments at once, e.g.:
set $x "a=1&b=2&c=3";
rewrite ^ /something?$x;
The only exception is enumerated captures extracted from the uri
during the rewrite in question (in 0.8.35 it even escapes "&").
It is believed that correct solution would be to implement some
urlencode/urldecode functions, but there is no consensus on
desired syntax yet. There are patches for $urlencode_* /
$urldecode_* variables by Kirill Korinskiy floating around, but
they were explicitly rejected by Igor.
2. When doing proxy_pass nginx do escape characters which aren't
valid in URI, but it doesn't to escape some chars which aren't
(like "<", ">", <">). That's why you see space escaped, but not
<">.
Configuration which should resolve most of your problems (at least
in 0.8.35) is:
location /search {
rewrite ^/search/([^/]+)/?([^-/&]+)?
/search?q=$1&cat=$2 break;
proxy_pass http://load_balance;
...
}
This won't fix '"' though. But it is believed that this shouldn't
lead to severe problems (please report back if it is, this will
help me to persuade Igor to apply patch which fixes it).
Maxim Dounin
More information about the nginx
mailing list