[PATCH] HTTP: Add new uri_normalization_percent_decode option

Michael Kourlas Michael.Kourlas at solace.com
Mon Apr 3 18:33:21 UTC 2023


Hello,

Thanks again for your feedback.

> Consider the following rewrite:
>
> rewrite ^/(.*) /$1 break;
>
> Assuming request to "GET /foo%2fbar/", what
> $uri_encoded_percent_and_reserved do you expect after each of
> these rewrites?

I do not think that rewrite does anything in practice. Following the rewrite,
I would expect $uri to remain unchanged at its current value of "/foo/bar/" and
$uri_encoded_percent_and_reserved to similarly remain unchanged at its current
value of "/foo%2fbar/".

> Similarly, consider the following rewrite:
>
> rewrite ^/foo/(.*) /$1 break;
>
> What $uri_encoded_percent_and_reserved is expected after the
> rewrite?

In this case the regular expression matches $uri but not
$uri_encoded_percent_and_reserved. One could say that this just means that only
$uri is updated, but that has the potential to cause confusion when a flag is
used that changes the control flow (unless the user explicitly opts into this
behaviour).

This could be addressed by adding a "match-source" optional argument to
"rewrite" with three values (and a default of "uri"):
* "uri" - rewrite directive matches and changes $uri only
* "uri_encoded_percent_and_reserved" - rewrite directive matches and changes
  $uri_encoded_percent_and_reserved only
* "all" - rewrite directive matches and changes both (if only one is matched,
  directive is not applied)

It might also be a good idea to add "uri_encoded_percent_and_reserved_regex"
and "uri_encoded_percent_and_reserved_replacement" arguments to be used with
"all", so that it is possible to use the same directive and flag even when
needing to perform slightly different rewrites for $uri versus
$uri_encoded_percent_and_reserved.

> Consider the following configuration:
>
> location /foo%2fbar/ match-source=uri-encoded-percent-and-reserved {
> ...
> }
>
> location /foo/bar/ match-source=uri {
> ...
> }
>
> The question is: which location is expected to be matched for the
> request "GET /foo%2fbar/"?

Both blocks match their respective variables. Since the first block has the
longest matching prefix, I expect it will be selected.

> Which location is expected to be matched for the request "GET
> /foo%2Fbar/" (note that it is exactly equivalent to "GET
> /foo%2fbar/").

Only the second block matches its variable, so I expect it will be selected.

Although paths are generally case sensitive, a percent-encoded character is not
supposed to be, so this behaviour is unfortunate. One possibility is to
automatically use case-insensitive matching for any part of a location prefix
using "match-source=uri-encoded-percent-and-reserved" that is a percent
encoded "%" or reserved character.

Alternatively this behaviour could be documented with an instruction to use
regular expressions instead.

> Assuming static handling in the locations, what happens with the
> request "GET /foo%2fbar/..%2fbazz"?

The first block would be used. However, $uri would be used for static
handling, so the path "/foo/bazz" would be looked up on the filesystem, not
"/foo%2fbar/..%2fbazz".

> Note that the behaviour does not seem to be obvious, and it is an
> open question if it can be clarified to be safe.

Fair enough. I am certainly happy to continue making changes to my proposal to
address the specific concerns you raise. However, are you saying that you have
broader overall concerns about safety, complexity, etc. that make a patch
implementing the proposal unlikely to be accepted, even if all specific
concerns are addressed?

Thanks,

Michael Kourlas
________________________________
 Confidentiality notice

This e-mail message and any attachment hereto contain confidential information which may be privileged and which is intended for the exclusive use of its addressee(s). If you receive this message in error, please inform sender immediately and destroy any copy thereof. Furthermore, any disclosure, distribution or copying of this message and/or any attachment hereto without the consent of the sender is strictly prohibited. Thank you.


More information about the nginx-devel mailing list