nginx doesn't handle different URL encodings well
Maxim Dounin
mdounin at mdounin.ru
Fri Oct 22 03:22:00 MSD 2010
Hello!
On Thu, Oct 21, 2010 at 11:09:59PM +0200, Pierre-Marie Baty wrote:
[...]
> > setting LANG to en_US.UTF-8 may help. (eg. "LANG=en_US.UTF-8
> > ls" in a
> > bash shells)
>
> Thanks for the tip. I followed your advice and tried many locale
> combinations today.
>
> Unfortunately none of them helped. I can't use UTF-8 as locale
> because FreeBSD's FFS has no support for multibyte filenames. So
> if I want the system "ls" command to output "été-2008.jpg" and
> not something weird, I have to use one of the 8-bit locales.
> Currently my LANG is fr_FR.ISO8859-15 (same as Latin-1 plus the
> €uro sign).
No, you haven't. Though you have to create files under locale you
set. File names are just bytes, and locale defines charset which
will be used to output them.
> OK, let's sum up :
>
> - nginx does no translation and the URL is directly passed as a request to the filesystem
Correct.
> - the new standards say that URLs are going to be sent UTF-8 encoded
Not correct. It's not what standards say, it's just what modern
browsers usually do by default.
> - UTF-8 is a multibyte encoding scheme
Correct.
> - my server's filesystem support several encoding schemes but
> not multibyte ones, and thus it doesn't support UTF-8.
Not correct. You server is character set agnostic, as well as
nginx.
> I guess I'll have to go down the painful URL rewrite way. What a pity...
As I already explained - this isn't going to help in all cases.
The only safe aproach is to use urlencoded links.
Maxim Dounin
More information about the nginx
mailing list