[PATCH] HTTP: Add new uri_normalization_percent_decode option

Maxim Dounin mdounin at mdounin.ru
Sat Apr 1 20:11:38 UTC 2023


Hello!

On Thu, Mar 30, 2023 at 05:19:08PM +0000, Michael Kourlas via nginx-devel wrote:

> Hello,
> 
> Thanks again for your comments.
> 
> > This implies, basically, that there are 3 forms of the request
> > URI: 1) fully encoded, as in $request_uri, 2) fully decoded, as in
> > $uri now, and 3) "all-except-percent-and-reserved". To implement this
> > correctly, it needs clear definition when each form is used, and
> > it is going to be a non-trivial task to do this safely.
> 
> I agree. A simple way to do this would be to make percent-decoding customizable
> on a per-directive basis. The core use case I was hoping to support is
> preserving encoded reserved characters in location matching (basically what was
> proposed in [1]), so that is what I would like to focus on in a reworked
> version of this patch.
> 
> I propose the following:
> 
> (1) The addition of a new variable called $uri_encoded_percent_and_reserved. As
> discussed, this variable is a special version of the normalized URI ($uri)
> that preserves any percent-encoded "%" or reserved characters.
> 
> (2) Every transformation applied to $uri (e.g. from the "rewrite" directive,
> internal redirects, etc.) is automatically applied to
> $uri_encoded_percent_and_reserved as well.
> 
> If this raises performance concerns, a new flag could be added to enable or
> disable the availability of $uri_encoded_percent_and_reserved.

You suggest that transformations of $uri are "automatically 
applied" to the non-fully-decoded variant.  Consider the following 
rewrite:

   rewrite ^/(.*) /$1 break;

Assuming request to "GET /foo%2fbar/", what 
$uri_encoded_percent_and_reserved do you expect after each of 
these rewrites?  Similarly, consider the following rewrite:

   rewrite ^/foo/(.*) /$1 break;

What $uri_encoded_percent_and_reserved is expected after the 
rewrite?

> (3) The addition of a new optional parameter to the URI form of "location"
> blocks called "match-source":
> 
> location [ = | ~ | ~* | ^~ ] uri [match-source=uri|uri-encoded-percent-and-reserved] {
>     ...
> }
> 
> For example:
> 
> location ~ ^/api/objects/[^/]+/subobjects(/.*)?$ match-source=uri-encoded-percent-and-reserved {
>     ...
> }
> 
> "match-source=uri" is the default and the current behaviour. When
> "uri-encoded-percent-and-reserved" is used, the location matching for that
> block uses $uri_encoded_percent_and_reserved rather than $uri. Nested location
> blocks are not affected (unless they also use
> "uri-encoded-percent-and-reserved").
> 
> In future it would be possible to use a similar pattern with other directives
> that use $uri, such as "proxy_pass", but that can be done as part of a separate
> patch.
> 
> If you think this is a sensible approach, I will submit a revised patch
> implementing it.

Consider the following configuration:

location /foo%2fbar/ match-source=uri-encoded-percent-and-reserved {
    ...
}

location /foo/bar/ match-source=uri {
    ...
}

The question is: which location is expected to be matched for the 
request "GET /foo%2fbar/"?

Other questions include:

- Which location is expected to be matched for the request "GET 
  /foo%2Fbar/" (note that it is exactly equivalent to "GET 
  /foo%2fbar/").

- Assuming static handling in the locations, what happens with the 
  request "GET /foo%2fbar/..%2fbazz"?

Note that the behaviour does not seem to be obvious, and it is an 
open question if it can be clarified to be safe.

-- 
Maxim Dounin
http://mdounin.ru/


More information about the nginx-devel mailing list