Danger to Nginx from raw unicode in paths?
jgehrcke at googlemail.com
Mon Jan 26 14:41:55 UTC 2015
In reference to your mail subject, one should note that "raw unicode"
does not exist. You should really understand what the term "unicode"
means, what the abstract meaning of unicode code points is, and what
UTF-8, for example, really is: it is just one of many possible ways to
encode characters into a raw byte representation. Again; there is no
such thing as "raw unicode".
Other than that, you have already received a good answer on Stack
Overflow. So, what is your question, exactly?
As stated on SO, for nginx, a location is just a sequence of bytes. You
surely understand that the space of byte sequences (given a certain
length) is larger than just the 65.000 items that you have worked with.
From my naive point of view I would say: no, there definitely is no
point in looking out for "non-standard" sequences in the most general
sense, because there are just too many of them. Having a proper white
list approach (specify those locations that *should* work in a certain
way, and reject all other requests) is a very safe concept.
More information about the nginx