International characters and serving files
Maxim Dounin
mdounin at mdounin.ru
Sat Feb 10 14:24:34 UTC 2024
Hello!
On Sat, Feb 10, 2024 at 03:14:02PM +1000, David Connors wrote:
> Hi All,
>
> I have moved off IIS/WIndows onto nginx on ubuntu a while back. Since doing
> so I receive 404s for files with international characters in their name.
> I've added the charset utf-8 directive to the nginx config. Looking at the
> request:
>
> https://www.davidconnors.com/wp-content/uploads/2022/08/Aliinale-Für-Alina.pdf
>
> Confirm that is exists on the file exist on the filesystem:
>
> -rwx------ 1 www-data www-data 10560787 Aug 21 2022 Aliinale-Für-Alina.pdf
>
> if I copy that from that name to a.pdf and request that it serves fine.
>
> Access log shows the character with the diacritic mark is escaped:
> 172.68.210.38 - - [10/Feb/2024:05:11:27 +0000] "GET
> /wp-content/uploads/2022/08/Aliinale-F%C3%BCr-Alina.pdf HTTP/1.1" 404 27524
> "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15
> (KHTML, like Gecko) Version/17.2.1 Safari/605.1.15"
>
> What confirmation directive am I missing?
File names on Unix systems are typically stored as bytes, and it
is user's responsibility to interpret them according to a
particular character set.
As long as nginx returns 404, this suggests that you don't have a
file with the name with C3 BC UTF-8 bytes in it: instead, there is
something different. My best guess is that you are using Latin1
as a charset for your terminal, and there is an FC byte instead. To
see what's there in fact, consider looking at the raw bytes in the
file name with something like "ls | hd".
Also, you can use nginx autoindex module - it will generate a page
with properly escaped links, so it will be possible to access
files regardless of the charset used in the file names.
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx
mailing list