[PATCH] HTTP: Add new uri_normalization_percent_decode option

Tue Mar 28 14:08:52 UTC 2023

Hello!

On Mon, Mar 27, 2023 at 04:19:27PM +0000, Michael Kourlas via nginx-devel wrote:

> > As far as I understand, it will irreversibly corrupt URIs with
> > double-encoded reserved characters.  For example, "%252F" will
> > become "%2F" when proxying in the following configuration:
> >
> >     location /foo/ {
> >         proxy_pass http://upstream/foo/;
> >     }
> 
> I believe you are correct. When the "all-except-reserved" option is used, nginx
> will not recognize, during the re-encoding step, that the "%" in "%2F" was
> originally percent-encoded and not part of a percent-encoded "/".
> 
> However, I think this issue can be addressed by renaming "all-except-reserved"
> to "all-except-percent-and-reserved" and adding "%" to the corresponding set of
> characters that is not decoded or encoded automatically when that option is
> used. "%252F" will then be left untouched throughout the entire process.

So it will prevent access to the files with "%", such as in 
"foo%bar", unless specifically handled (see below).

> > Further, requests to static files with (properly escaped) reserved
> > characters will simply fail, because nginx won't decode these
> > characters.  For example, in the following trivial configuration a
> > request to "/foo%3Fbar" won't be decoded to match "/foo?bar" file
> > under the document root:
> >
> >     location / {
> >         # static files
> >     }
> 
> Yes, I agree it doesn't make sense to keep "%" and reserved characters
> percent-encoded when performing a filesystem lookup, because at that point we
> can be certain they're not being used as delimiters or anything other than
> data.
> 
> However, I think this issue can be addressed by decoding any remaining
> percent-encoded characters at the point of filesystem lookup, as well as any
> other place where you think it is appropriate. (Of course, this should only be
> done when using the "all-except-percent-and-reserved" option, and only with "%"
> and reserved characters, to avoid double-decoding double-encoded characters.)

This implies, basically, that there are 3 forms of the request 
URI: 1) fully encoded, as in $request_uri, 2) fully decoded, as in 
$uri now, and 3) "all-except-percent-and-reserved".  To implement this 
correctly, it needs clear definition when each form is used, and 
it is going to be a non-trivial task to do this safely.

For example, using fully decoded form for file system lookups 
while using "all-except-percent-and-reserved" form for location 
matching is going to be unsafe: restrictions like "location 
/private/ { deny all; }" can be easily bypassed.  The same issue 
applies to other cases not involving file system access directly 
in nginx: for example, consider $fastcgi_script_name in FastCGI 
proxying.

> > Please also note that the configuration directive you've
> > introduced in this patch applies to URI parsing from not-yet-final
> > server block (see [1] for details), but the configuration from the
> > final server block will be used for URI escaping.  These
> > configuration can be different, and this might result in various
> > additional issues.
> 
> I think this issue can be addressed by making the new directive available only
> in http blocks. This ensures that all virtual servers share the same
> configuration.

This approach basically precludes using different settings in 
different server blocks, even completely independent, such as 
IP-based virtual servers, and is not generally the way to go.  
Instead, this should be implemented correctly, if at all.

> If you agree with these suggestions, I'd be happy to submit an updated patch.
> 
> Reserved characters are often used as delimiters in APIs, and I believe it is
> important that nginx be able to distinguish between usages as delimiters and
> usages as data.

First of all, you may want to focus on the logic you are going to 
implement.

-- 
Maxim Dounin
http://mdounin.ru/