International characters and serving files

Maxim Dounin mdounin at mdounin.ru
Sat Feb 10 14:24:34 UTC 2024


Hello!

On Sat, Feb 10, 2024 at 03:14:02PM +1000, David Connors wrote:

> Hi All,
> 
> I have moved off IIS/WIndows onto nginx on ubuntu a while back. Since doing
> so I receive 404s for files with international characters in their name.
> I've added the charset utf-8 directive to the nginx config. Looking at the
> request:
> 
> https://www.davidconnors.com/wp-content/uploads/2022/08/Aliinale-Für-Alina.pdf
> 
> Confirm that is exists on the file exist on the filesystem:
> 
> -rwx------  1 www-data www-data 10560787 Aug 21  2022 Aliinale-Für-Alina.pdf
> 
> if I copy that from that name to a.pdf and request that it serves fine.
> 
> Access log shows the character with the diacritic mark is escaped:
> 172.68.210.38 - - [10/Feb/2024:05:11:27 +0000] "GET
> /wp-content/uploads/2022/08/Aliinale-F%C3%BCr-Alina.pdf HTTP/1.1" 404 27524
> "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15
> (KHTML, like Gecko) Version/17.2.1 Safari/605.1.15"
> 
> What confirmation directive am I missing?

File names on Unix systems are typically stored as bytes, and it 
is user's responsibility to interpret them according to a 
particular character set.

As long as nginx returns 404, this suggests that you don't have a 
file with the name with C3 BC UTF-8 bytes in it: instead, there is 
something different.  My best guess is that you are using Latin1 
as a charset for your terminal, and there is an FC byte instead.  To 
see what's there in fact, consider looking at the raw bytes in the 
file name with something like "ls | hd".

Also, you can use nginx autoindex module - it will generate a page 
with properly escaped links, so it will be possible to access 
files regardless of the charset used in the file names.

-- 
Maxim Dounin
http://mdounin.ru/


More information about the nginx mailing list