Danger to Nginx from raw unicode in paths?
David
xeoncross at gmail.com
Mon Jan 26 01:06:13 UTC 2015
I was recently wondering if I should filter URL's by characters to only
allow what is standard in applications.
Words, Numbers, and couple characters [.-_/\]. We know the list of
supported URL's and Domains is really just a subset of ASCII
<http://perishablepress.com/stop-using-unsafe-characters-in-urls/>.
However, I'm not totally sure what nginx does when I pass "µ" to it.
I came up with a simple regular expression to match something that isn't
one of those:
location ~* "(*UTF8)([^\p{L}\p{N}/\.\-\%\\\]+)" ) {
if ($uri ~* "(*UTF8)([^\p{L}\p{N}/\.\-\%\\\]+)" ) {
However, I'm wondering if I actually need to use the UTF-8 matching since
clients should default to URL encoding (%20) or hex encoding (\x23) the
bytes and the actual transfer should be binary anyway.
Here is an example test where I piped almost all 65,000 unicode points to
nginx via curl:
https://gist.github.com/Xeoncross/acca3f09c5aeddac8c9f
For example: $ curl -v http://localhost/与
Basically, is there any point to watching URL's for non-standard sequences
looking for possible attacks?
( FYI: I posted more details that led to this question here:
http://stackoverflow.com/questions/28055909/does-nginx-support-raw-unicode-in-paths
)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20150125/0e8ac48a/attachment.html>
More information about the nginx
mailing list