[PATCH] HTTP: Add new uri_normalization_percent_decode option
Maxim Dounin
mdounin at mdounin.ru
Tue Apr 4 03:26:06 UTC 2023
Hello!
On Mon, Apr 03, 2023 at 06:33:21PM +0000, Michael Kourlas via nginx-devel wrote:
> > Consider the following rewrite:
> >
> > rewrite ^/(.*) /$1 break;
> >
> > Assuming request to "GET /foo%2fbar/", what
> > $uri_encoded_percent_and_reserved do you expect after each of
> > these rewrites?
>
> I do not think that rewrite does anything in practice. Following the rewrite,
> I would expect $uri to remain unchanged at its current value of "/foo/bar/" and
> $uri_encoded_percent_and_reserved to similarly remain unchanged at its current
> value of "/foo%2fbar/".
How do you expect this to be detected by the code? Direct
comparison of the new URI as generated by the rewrite with the old
URI?
Bonus question: what happens with the following rewrite:
rewrite ^/(.*) /bazz/$1 break;
It is basically equivalent to the above except it certainly
changes the URI.
> > Similarly, consider the following rewrite:
> >
> > rewrite ^/foo/(.*) /$1 break;
> >
> > What $uri_encoded_percent_and_reserved is expected after the
> > rewrite?
>
> In this case the regular expression matches $uri but not
> $uri_encoded_percent_and_reserved. One could say that this just means that only
> $uri is updated, but that has the potential to cause confusion when a flag is
> used that changes the control flow (unless the user explicitly opts into this
> behaviour).
>
> This could be addressed by adding a "match-source" optional argument to
> "rewrite" with three values (and a default of "uri"):
> * "uri" - rewrite directive matches and changes $uri only
> * "uri_encoded_percent_and_reserved" - rewrite directive matches and changes
> $uri_encoded_percent_and_reserved only
> * "all" - rewrite directive matches and changes both (if only one is matched,
> directive is not applied)
>
> It might also be a good idea to add "uri_encoded_percent_and_reserved_regex"
> and "uri_encoded_percent_and_reserved_replacement" arguments to be used with
> "all", so that it is possible to use the same directive and flag even when
> needing to perform slightly different rewrites for $uri versus
> $uri_encoded_percent_and_reserved.
So, your original suggestion that "every transformation applied to
$uri ... is automatically applied to
$uri_encoded_percent_and_reserved as well" is no longer relevant,
correct?
Considering "match-source=uri", how do you expect the resulting
$uri_encoded_percent_and_reserved to be set? From the description
it looks like it is expected to remain unchanged, though for the
above example this will result in $uri being "/bar/" and
$uri_encoded_percent_and_reserved being "/foo%2fbar/", which looks
certainly wrong. Is it really the intention?
The same question applies to
"match-source=uri_encoded_percent_and_reserved".
> > Consider the following configuration:
> >
> > location /foo%2fbar/ match-source=uri-encoded-percent-and-reserved {
> > ...
> > }
> >
> > location /foo/bar/ match-source=uri {
> > ...
> > }
> >
> > The question is: which location is expected to be matched for the
> > request "GET /foo%2fbar/"?
>
> Both blocks match their respective variables. Since the first block has the
> longest matching prefix, I expect it will be selected.
That's not how prefix locations work: they are matched based on
the longest prefix, and order of locations does not matter (and
not even preserved/known during matching, since matching uses a
prefix tree). So your suggestion implies that ordered matching
should be introduced for prefix locations, correct?
> > Which location is expected to be matched for the request "GET
> > /foo%2Fbar/" (note that it is exactly equivalent to "GET
> > /foo%2fbar/").
>
> Only the second block matches its variable, so I expect it will be selected.
>
> Although paths are generally case sensitive, a percent-encoded character is not
> supposed to be, so this behaviour is unfortunate. One possibility is to
> automatically use case-insensitive matching for any part of a location prefix
> using "match-source=uri-encoded-percent-and-reserved" that is a percent
> encoded "%" or reserved character.
>
> Alternatively this behaviour could be documented with an instruction to use
> regular expressions instead.
You mean, only allow
"match-source=uri-encoded-percent-and-reserved" for locations
given by regular expressions, correct?
> > Assuming static handling in the locations, what happens with the
> > request "GET /foo%2fbar/..%2fbazz"?
>
> The first block would be used. However, $uri would be used for static
> handling, so the path "/foo/bazz" would be looked up on the filesystem, not
> "/foo%2fbar/..%2fbazz".
So it can be easily used to bypass security restrictions, such as
in:
location /foo%2fbar/ match-source=uri-encoded-percent-and-reserved {
}
location /admin/ {
allow 127.0.0.1;
deny all;
}
Note that the request "GET /foo%2fbar/..%2f..%2fadmin/secret" is
going to access protected files in <root>/admin/, while it
shouldn't be allowed to.
> > Note that the behaviour does not seem to be obvious, and it is an
> > open question if it can be clarified to be safe.
>
> Fair enough. I am certainly happy to continue making changes to my proposal to
> address the specific concerns you raise. However, are you saying that you have
> broader overall concerns about safety, complexity, etc. that make a patch
> implementing the proposal unlikely to be accepted, even if all specific
> concerns are addressed?
I've just asked some question on how your proposal is expected to
work - and summarized that the behaviour suggested is certainly
not obvious, since there are lots of questions on how it is
expected to behave in various edge cases.
But I certainly agree that there are issues with safety and
complexity in the proposal. And I don't think that addressing any
specific concerns will help here, since addressing them seem to
result in increased complexity and more concerns. It might be a
good idea to start from scratch instead of trying to fix the
proposal.
Just in case, below is the summary of what I think about the
topic.
Short version:
All solutions I'm aware of suck. Don't use encoded slashes in
URIs, it hurts.
Long version:
First of all, forget about "reserved characters". The only
character that really matters is slash ("/"), as this is the only
character which is indeed reserved in URI path nginx works with.
There are 3 basic approaches to handling encoded slashes in URIs,
mostly outlined in Apache's AllowEncodedSlashes directive (see
https://httpd.apache.org/docs/2.4/mod/core.html#allowencodedslashes
for details):
1. Reject them. This what Apache does by default (though with
questionable error code).
2. Decode them and assume equivalent to non-encoded slashes. This
is what nginx does. Apache does something similar with
"AllowEncodedSlashes On", but given the note in the directive
description it looks like it does not implement the "assume
equivalent" part.
3. Do not decode them and expect encoded slashes are corrupted if
URI is re-encoded. This also implies that if URI is not
re-encoded but proxied as is, restrictions like in "location
/admin/ { deny all; }" can be bypassed if slashes are decoded by
the backend server.
All of the approaches have their pros and cons, with the reject
one being the safest, while decode and do not decode resulting in
different forms of corruptions, and not decoding resulting in
potential security issues on proxying.
I don't think that trying to combine different approaches in the
normal location matching is going to work. Rather, this will
result in unmanageable complexity and will be unsafe, as in the
above proposal.
In nginx, the only implemented approach is "decode". It does,
however, also provides the original request URI in the
$request_uri variable, so the "reject" approach can be trivially
implemented in the configuration:
if ($request_uri ~* "%2f") {
return 400;
}
Similarly, as already mentioned in #2225, one can do something
like this:
if ($request_uri ~ "^/api/objects/[^/]+/subobjects(/.*)?$") {
...
}
Further, nginx makes it possible to proxy the request without any URI
modifications, as in "proxy_pass http://upstream;" (note no URI
component after the upstream server name), so requests with
encoded slashes can be inspected and proxied without corruption.
This might be already enough for most, if not all, tasks involving
encoded slashes: they can be handled without corruption if things
are properly configured. Given the above options, this is more
than usually available.
Note that $request_uri matching doesn't imply various
normalizations nginx usually does on URI before matching
locations, such as decoding encoded characters, so something like
"GET /%61pi/objects/..." won't be matched. This can be improved
by providing an additional variable, or by a smart enough
urldecode() function (see ticket #52), or by an arbitrary
normalization written in an embedded language, such as Perl or
njs. Or it might not worth the effort though, and just rejecting
such non-normalized requests would be a good enough solution for
most use cases.
Similarly, it might be possible to explicitly implement the "do
not decode" approach, with something like "decode_slashes off;"
(similarly to "merge_slashes off;"). I tend to think that it is
going to be just another "never use it, it is not secure"
directive, much like "merge_slashes", and the mere existence of
this directive is not going to help anyone. (Also, it might be
actually a good idea to remove "merge_slashes" instead.)
Bonus game:
The dot (".") character is not reserved in URIs, yet it has
special meaning when used in "/./" and "/../" constructs. And
escaping dots won't help: since "." is not reserved, "/%2e/" is
exactly equivalent to "/./", and it is in turn equivalent to "/".
As such, APIs designed with arbitrary object names in URIs and
assuming escaping is enough simply won't be able to work for these
names.
Hope this helps.
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx-devel
mailing list