Setting Charset on Nginx PHP virtual host

Francis Daly francis at daoine.org
Fri Aug 2 15:05:54 UTC 2019


On Fri, Aug 02, 2019 at 03:11:05PM +0200, Vincent M. wrote:

Hi there,

> So I tried in http with empty charset_map:
>         charset_map iso-8859-1 utf-8 { }
> But special characters like é are displayed with ?

It seems to work for me as-is. What is different for you?

"work for me" means "the utf-8 character é becomes the 6 characters
é, which the html-viewer is expected to display as LATIN SMALL
LETTER E WITH ACUTE".

nginx.conf:
===
http {
    charset_map iso-8859-1 utf-8 { }
    server {
        listen 9876;
        charset utf-8;
    }
    server {
        listen 9877;
        charset iso-8859-1;
        override_charset on;
        location /x/ {
                proxy_pass http://127.0.0.1:9876/;
        }
    }
}
===

$ cat html/a/index.html
little e: é; big E: É
$ od -bc html/a/index.html
0000000 154 151 164 164 154 145 040 145 072 040 303 251 073 040 142 151
          l   i   t   t   l   e       e   :     303 251   ;       b   i
0000020 147 040 105 072 040 303 211 040 012
          g       E   :     303 211      \n
0000031

$ curl -i http://127.0.0.10:9876/a/ # headers edited
HTTP/1.1 200 OK
Server: nginx/1.17.2
Content-Type: text/html; charset=utf-8

little e: é; big E: É

$ curl -i http://127.0.0.10:9877/x/a/ # headers edited
HTTP/1.1 200 OK
Server: nginx/1.17.2
Content-Type: text/html; charset=iso-8859-1

little e: é; big E: É


And when I change nginx.conf to include a partial "correct" charset map:

===
    charset_map iso-8859-1 utf-8 {
E9  C3A9;
}
===

$ curl -i http://127.0.0.10:9877/x/a/
HTTP/1.1 200 OK
Server: nginx/1.17.2
Content-Type: text/html; charset=iso-8859-1

little e: �; big E: É

$ curl -i http://127.0.0.10:9877/x/a/ | tail -n 1 | od -bc
0000000 154 151 164 164 154 145 040 145 072 040 351 073 040 142 151 147
          l   i   t   t   l   e       e   :     351   ;       b   i   g
0000020 040 105 072 040 046 043 062 060 061 073 040 012
              E   :       &   #   2   0   1   ;      \n
0000034

The utf-8 e-acute was changed to the correct iso-8859-1 octet (octal
351/hex e9/decimal 233), which my terminal renders as "unknown" because
it is invalid utf-8.

> Where to find a charset_map?

It should not be necessary, according to the nginx docs, due to the
html-replacement; but if you want one, you can find-or-create one.

Basically, every octet from A0 to FF maps to the utf-8 equivalent from
C2A0 to C2BF and from C380 to C3BF.

The format matches the three example charset-map files that nginx
provides.

Oh - as one other wrinkle -- it is possible that the visual character
e-acute is *not* sent as the octets C3A9; but is instead sent as the
octets 65CC81 (e, following by a combining acute accent) -- and off-hand,
I don't know nginx will convert that. Possibly é, which might not
render very nicely in your html viewer.

But before you worry about that extra wrinkle, see what octets are sent,
and see where the problem comes in that makes something show as the ?

Cheers,

	f
-- 
Francis Daly        francis at daoine.org


More information about the nginx mailing list