[PATCH] HTTP: Add new uri_normalization_percent_decode option
Maxim Dounin
mdounin at mdounin.ru
Sat Apr 1 20:11:38 UTC 2023
Hello!
On Thu, Mar 30, 2023 at 05:19:08PM +0000, Michael Kourlas via nginx-devel wrote:
> Hello,
>
> Thanks again for your comments.
>
> > This implies, basically, that there are 3 forms of the request
> > URI: 1) fully encoded, as in $request_uri, 2) fully decoded, as in
> > $uri now, and 3) "all-except-percent-and-reserved". To implement this
> > correctly, it needs clear definition when each form is used, and
> > it is going to be a non-trivial task to do this safely.
>
> I agree. A simple way to do this would be to make percent-decoding customizable
> on a per-directive basis. The core use case I was hoping to support is
> preserving encoded reserved characters in location matching (basically what was
> proposed in [1]), so that is what I would like to focus on in a reworked
> version of this patch.
>
> I propose the following:
>
> (1) The addition of a new variable called $uri_encoded_percent_and_reserved. As
> discussed, this variable is a special version of the normalized URI ($uri)
> that preserves any percent-encoded "%" or reserved characters.
>
> (2) Every transformation applied to $uri (e.g. from the "rewrite" directive,
> internal redirects, etc.) is automatically applied to
> $uri_encoded_percent_and_reserved as well.
>
> If this raises performance concerns, a new flag could be added to enable or
> disable the availability of $uri_encoded_percent_and_reserved.
You suggest that transformations of $uri are "automatically
applied" to the non-fully-decoded variant. Consider the following
rewrite:
rewrite ^/(.*) /$1 break;
Assuming request to "GET /foo%2fbar/", what
$uri_encoded_percent_and_reserved do you expect after each of
these rewrites? Similarly, consider the following rewrite:
rewrite ^/foo/(.*) /$1 break;
What $uri_encoded_percent_and_reserved is expected after the
rewrite?
> (3) The addition of a new optional parameter to the URI form of "location"
> blocks called "match-source":
>
> location [ = | ~ | ~* | ^~ ] uri [match-source=uri|uri-encoded-percent-and-reserved] {
> ...
> }
>
> For example:
>
> location ~ ^/api/objects/[^/]+/subobjects(/.*)?$ match-source=uri-encoded-percent-and-reserved {
> ...
> }
>
> "match-source=uri" is the default and the current behaviour. When
> "uri-encoded-percent-and-reserved" is used, the location matching for that
> block uses $uri_encoded_percent_and_reserved rather than $uri. Nested location
> blocks are not affected (unless they also use
> "uri-encoded-percent-and-reserved").
>
> In future it would be possible to use a similar pattern with other directives
> that use $uri, such as "proxy_pass", but that can be done as part of a separate
> patch.
>
> If you think this is a sensible approach, I will submit a revised patch
> implementing it.
Consider the following configuration:
location /foo%2fbar/ match-source=uri-encoded-percent-and-reserved {
...
}
location /foo/bar/ match-source=uri {
...
}
The question is: which location is expected to be matched for the
request "GET /foo%2fbar/"?
Other questions include:
- Which location is expected to be matched for the request "GET
/foo%2Fbar/" (note that it is exactly equivalent to "GET
/foo%2fbar/").
- Assuming static handling in the locations, what happens with the
request "GET /foo%2fbar/..%2fbazz"?
Note that the behaviour does not seem to be obvious, and it is an
open question if it can be clarified to be safe.
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx-devel
mailing list