nginx doesn't handle different URL encodings well

Pierre-Marie Baty baty.pm at hotmail.fr
Fri Oct 22 01:09:59 MSD 2010




> From: edhoprima at gmail.com
> Date: Thu, 21 Oct 2010 23:45:06 +0700
> To: nginx at nginx.org
> Subject: Re: nginx doesn't handle different URL encodings well
> 
> On Thu, Oct 21, 2010 at 8:57 AM, helen <nginx-forum at nginx.us> wrote:
> > On Wed, 20 Oct 2010 21:23:46 -0400, Pierre-Marie Baty wrote:
> >
> >> When the URL is Latin-1 encoded, the request sent is : GET
> >> /%e9t%e9-2008.jpg ----> nginx resolves this to "été-2008.jpg", the
> > file
> >> is served, OK
> >> When the URL is UTF-8 encoded, the request sent is : GET
> >> /%C3%A9t%C3%A9-2008.jpg ----> nginx resolves this to
> > "été-2008.jpg",
> >> and the file is not served. (file not found)
> >
> 
> except that it works the exact reverse in my side. Are you sure the
> filename for the file in the filesystem stored in utf-8 format?
> 
> setting LANG to en_US.UTF-8 may help. (eg. "LANG=en_US.UTF-8 ls" in a
> bash shells)

Thanks for the tip. I followed your advice and tried many locale combinations today.

Unfortunately none of them helped. I can't use UTF-8 as locale because FreeBSD's FFS has no support for multibyte filenames. So if I want the system "ls" command to output "été-2008.jpg" and not something weird, I have to use one of the 8-bit locales. Currently my LANG is fr_FR.ISO8859-15 (same as Latin-1 plus the €uro sign).

OK, let's sum up :

- nginx does no translation and the URL is directly passed as a request to the filesystem
- the new standards say that URLs are going to be sent UTF-8 encoded
- UTF-8 is a multibyte encoding scheme
- my server's filesystem support several encoding schemes but not multibyte ones, and thus it doesn't support UTF-8.

I guess I'll have to go down the painful URL rewrite way. What a pity...

I'm quite new to nginx. Could someone suggest me a config file syntax to do this ?

-- 
Pierre-Marie Baty 		 	   		  


More information about the nginx mailing list