[PATCH] HTTP: Add new uri_normalization_percent_decode option

Fri Feb 17 12:59:12 UTC 2023

Hello!

On Wed, Feb 15, 2023 at 11:50:13AM -0500, Michael Kourlas via nginx-devel wrote:

> # HG changeset patch
> # User Michael Kourlas <michael.kourlas at solace.com>
> # Date 1676408746 18000
> #      Tue Feb 14 16:05:46 2023 -0500
> # Node ID 129437ade41b14a584fb4b7558accc1b8dee7f45
> # Parent  cffaf3f2eec8fd33605c2a37814f5ffc30371989
> HTTP: Add new uri_normalization_percent_decode option
> 
> This patch addresses ticket #2225 by adding a new
> uri_normalization_percent_decode configuration option that controls which
> characters are percent-decoded by nginx as part of its URI normalization.
> 
> The option has two values: "all" and "all-except-reserved". "all" is the
> default value and is the current behaviour. When the option is set to
> "all-except-reserved", nginx percent-decodes all characters except those in the
> reserved set defined by RFC 3986:
> 
>       reserved    = gen-delims / sub-delims
> 
>       gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
> 
>       sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
>                   / "*" / "+" / "," / ";" / "="
> 
> In addition, when "all-except-reserved" is used, nginx will not re-encode "%"
> from the request URI when it observes that it is part of a percent-encoded
> reserved character.
> 
> When nginx percent-decodes reserved characters, this can often change the
> request URI's semantics, making it impossible to use a normalized URI for
> certain use cases. "uri_normalization_percent_decode" gives the configuration
> author the freedom to determine which reserved characters are semantically
> relevant and which are not.
> 
> For example, consider the following location block, which handles part of a
> hypothetical API:
> 
> location ~ ^/api/objects/[^/]+/subobjects(/.*)?$ {
>     ...
> }
> 
> Because nginx always normalizes "%2F" to "/", this location block will not
> match a path of /api/objects/sample%2Fname/subobjects, even if the API permits
> "/" to appear percent-encoded in the URI as part of object names. nginx will
> instead interpret this as /api/objects/sample/name/subobjects, a completely
> different path. Setting "uri_normalization_percent_decode" to
> "all-except-reserved" will leave "%2F" encoded, resulting in the expected
> behaviour.

Thanks for the patch.

As far as I understand, it will irreversibly corrupt URIs with 
double-encoded reserved characters.  For example, "%252F" will 
become "%2F" when proxying in the following configuration:

    location /foo/ {
        proxy_pass http://upstream/foo/;
    }

Further, requests to static files with (properly escaped) reserved 
characters will simply fail, because nginx won't decode these 
characters.  For example, in the following trivial configuration a 
request to "/foo%3Fbar" won't be decoded to match "/foo?bar" file 
under the document root: 

    location / {
        # static files
    }

Please also note that the configuration directive you've 
introduced in this patch applies to URI parsing from not-yet-final 
server block (see [1] for details), but the configuration from the 
final server block will be used for URI escaping.  These 
configuration can be different, and this might result in various 
additional issues.

Overall, I tend to think that the suggested patch will introduce 
much more problems than it tries to solve, and I would rather not.  

[1] http://nginx.org/en/docs/http/server_names.html#virtual_server_selection

-- 
Maxim Dounin
http://mdounin.ru/