nginx doesn't handle different URL encodings well

Maxim Dounin mdounin at mdounin.ru
Thu Oct 21 11:29:24 MSD 2010


Hello!

On Thu, Oct 21, 2010 at 03:23:46AM +0200, Pierre-Marie Baty wrote:

> 
> Hello Igor, hello all,
>  
> Congratulations for your fantastic and neatly programmed web 
> server. It's a pleasure to use it.
>  
> I have a problem with nginx not serving files with accentuated 
> characters when the sumbitted URL is UTF-8 encoded.
>  
> Here is my nginx.conf : http://nginx.pastebin.com/aB7XRLM3 It's 
> a home webserver that is primarily used to serve stuff like 
> holiday photos.
>  
> For example, I have a file called "été-2008.jpg" on my 
> webserver. When I request http://myserver/été-2008.jpg, 
> depending on whether the "Always send URLs as UTF-8" checkbox is 
> checked or not in the Internet Explorer advanced options, the 
> file is correctly served, or not.
>  
> When the URL is Latin-1 encoded, the request sent is : GET 
> /%e9t%e9-2008.jpg ----> nginx resolves this to "été-2008.jpg", 
> the file is served, OK
> When the URL is UTF-8 encoded, the request sent is : GET 
> /%C3%A9t%C3%A9-2008.jpg ----> nginx resolves this to 
> "été-2008.jpg", and the file is not served. (file not found)
>
> Shouldn't a fallback mechanism be implemented so that when a 
> file isn't found after an URL has been decoded, a second try is 
> made with another encoding ? I believe two RFCs are involved : 
> rfc2396 and rfc3986 (info given by PiotrSikora on IRC). IMO, 
> nginx shouldn't assume the URL it gets are always following the 
> same RFC. From what I know, this ambiguity is resolved in 
> Apache. Maybe they have that sort of fallback mechanism.

The only (related to the question) difference between RFC2396 and 
RFC3986 is that later one recommends using UTF-8 for new URI 
schemes.  There is no ambiguity between the two: character set for 
non-US-ASCII characters in http URLs isn't defined (though most 
browsers nowadays use UTF-8 by default).

The only solution is to provide correct URLs, i.e. already 
encoded ones.

If you think that "fallback mechanism" is a good idea - you may 
implement one with "try_files" directive and embedded perl module 
to do recoding between Latin1 and UTF-8.  Note though that this 
may lead to unexpected results: "/%C3%A9" may be Latin1 "/é" as 
well as UTF-8 "/é".

Maxim Dounin



More information about the nginx mailing list