Danger to Nginx from raw unicode in paths?

Jan-Philip Gehrcke jgehrcke at googlemail.com
Mon Jan 26 14:41:55 UTC 2015


In reference to your mail subject, one should note that "raw unicode" 
does not exist. You should really understand what the term "unicode" 
means, what the abstract meaning of unicode code points is, and what 
UTF-8, for example, really is: it is just one of many possible ways to 
encode characters into a raw byte representation. Again; there is no 
such thing as "raw unicode".

Other than that, you have already received a good answer on Stack 
Overflow. So, what is your question, exactly?

As stated on SO, for nginx, a location is just a sequence of bytes. You 
surely understand that the space of byte sequences (given a certain 
length) is larger than just the 65.000 items that you have worked with.

 From my naive point of view I would say: no, there definitely is no 
point in looking out for "non-standard" sequences in the most general 
sense, because there are just too many of them. Having a proper white 
list approach (specify those locations that *should* work in a certain 
way, and reject all other requests) is a very safe concept.




More information about the nginx mailing list