[PATCH] HTTP: Add new uri_normalization_percent_decode option
Maxim Dounin
mdounin at mdounin.ru
Fri Feb 17 12:59:12 UTC 2023
Hello!
On Wed, Feb 15, 2023 at 11:50:13AM -0500, Michael Kourlas via nginx-devel wrote:
> # HG changeset patch
> # User Michael Kourlas <michael.kourlas at solace.com>
> # Date 1676408746 18000
> # Tue Feb 14 16:05:46 2023 -0500
> # Node ID 129437ade41b14a584fb4b7558accc1b8dee7f45
> # Parent cffaf3f2eec8fd33605c2a37814f5ffc30371989
> HTTP: Add new uri_normalization_percent_decode option
>
> This patch addresses ticket #2225 by adding a new
> uri_normalization_percent_decode configuration option that controls which
> characters are percent-decoded by nginx as part of its URI normalization.
>
> The option has two values: "all" and "all-except-reserved". "all" is the
> default value and is the current behaviour. When the option is set to
> "all-except-reserved", nginx percent-decodes all characters except those in the
> reserved set defined by RFC 3986:
>
> reserved = gen-delims / sub-delims
>
> gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
>
> sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
> / "*" / "+" / "," / ";" / "="
>
> In addition, when "all-except-reserved" is used, nginx will not re-encode "%"
> from the request URI when it observes that it is part of a percent-encoded
> reserved character.
>
> When nginx percent-decodes reserved characters, this can often change the
> request URI's semantics, making it impossible to use a normalized URI for
> certain use cases. "uri_normalization_percent_decode" gives the configuration
> author the freedom to determine which reserved characters are semantically
> relevant and which are not.
>
> For example, consider the following location block, which handles part of a
> hypothetical API:
>
> location ~ ^/api/objects/[^/]+/subobjects(/.*)?$ {
> ...
> }
>
> Because nginx always normalizes "%2F" to "/", this location block will not
> match a path of /api/objects/sample%2Fname/subobjects, even if the API permits
> "/" to appear percent-encoded in the URI as part of object names. nginx will
> instead interpret this as /api/objects/sample/name/subobjects, a completely
> different path. Setting "uri_normalization_percent_decode" to
> "all-except-reserved" will leave "%2F" encoded, resulting in the expected
> behaviour.
Thanks for the patch.
As far as I understand, it will irreversibly corrupt URIs with
double-encoded reserved characters. For example, "%252F" will
become "%2F" when proxying in the following configuration:
location /foo/ {
proxy_pass http://upstream/foo/;
}
Further, requests to static files with (properly escaped) reserved
characters will simply fail, because nginx won't decode these
characters. For example, in the following trivial configuration a
request to "/foo%3Fbar" won't be decoded to match "/foo?bar" file
under the document root:
location / {
# static files
}
Please also note that the configuration directive you've
introduced in this patch applies to URI parsing from not-yet-final
server block (see [1] for details), but the configuration from the
final server block will be used for URI escaping. These
configuration can be different, and this might result in various
additional issues.
Overall, I tend to think that the suggested patch will introduce
much more problems than it tries to solve, and I would rather not.
[1] http://nginx.org/en/docs/http/server_names.html#virtual_server_selection
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx-devel
mailing list