nginx doesn't handle different URL encodings well

Maxim Dounin mdounin at mdounin.ru
Fri Oct 22 03:22:00 MSD 2010


Hello!

On Thu, Oct 21, 2010 at 11:09:59PM +0200, Pierre-Marie Baty wrote:

[...]

> > setting LANG to en_US.UTF-8 may help. (eg. "LANG=en_US.UTF-8 
> > ls" in a
> > bash shells)
> 
> Thanks for the tip. I followed your advice and tried many locale 
> combinations today.
> 
> Unfortunately none of them helped. I can't use UTF-8 as locale 
> because FreeBSD's FFS has no support for multibyte filenames. So 
> if I want the system "ls" command to output "été-2008.jpg" and 
> not something weird, I have to use one of the 8-bit locales. 
> Currently my LANG is fr_FR.ISO8859-15 (same as Latin-1 plus the 
> €uro sign).

No, you haven't.  Though you have to create files under locale you 
set.  File names are just bytes, and locale defines charset which 
will be used to output them.

> OK, let's sum up :
> 
> - nginx does no translation and the URL is directly passed as a request to the filesystem

Correct.

> - the new standards say that URLs are going to be sent UTF-8 encoded

Not correct.  It's not what standards say, it's just what modern 
browsers usually do by default.

> - UTF-8 is a multibyte encoding scheme

Correct.

> - my server's filesystem support several encoding schemes but 
> not multibyte ones, and thus it doesn't support UTF-8.

Not correct.  You server is character set agnostic, as well as 
nginx.

> I guess I'll have to go down the painful URL rewrite way. What a pity...

As I already explained - this isn't going to help in all cases.  
The only safe aproach is to use urlencoded links.

Maxim Dounin



More information about the nginx mailing list