<div dir="auto">Hi, sorry for bothering you.</div><div dir="auto">It looks good to me. Thanks!</div><div dir="auto"><br></div><div dir="auto">—</div><div dir="auto">Yugo Horie</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 2, 2023 at 8:51 Maxim Dounin <<a href="mailto:mdounin@mdounin.ru">mdounin@mdounin.ru</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello!<br>

<br>

On Thu, Feb 23, 2023 at 09:24:52AM +0900, u5h wrote:<br>

<br>

> Thanks reviewing!<br>

> <br>

> I agree with your early return strategy and I would reconsider that<br>

> condition below.<br>

> <br>

> # HG changeset patch<br>

> # User Yugo Horie <<a href="mailto:u5.horie@gmail.com" target="_blank">u5.horie@gmail.com</a>><br>

> # Date 1677107390 -32400<br>

> #      Thu Feb 23 08:09:50 2023 +0900<br>

> # Node ID a3ca45d39fcfd32ca92a6bd25ec18b6359b90f1a<br>

> # Parent  f4653576ffcd286bed7229e18ee30ec3c713b4de<br>

> Core: restrict the rule of utf-8 decode.<br>

> <br>

> The first byte being above 0xf8 which is referred to 5byte<br>

> over length older utf-8 becomes invalid.<br>

> Even the range of the first byte from 0xf5 to<br>

> 0xf7 is valid in the term of the codepoint decoding.<br>

> See <a href="https://datatracker.ietf.org/doc/html/rfc3629#section-4" rel="noreferrer" target="_blank">https://datatracker.ietf.org/doc/html/rfc3629#section-4</a>.<br>

> <br>

> diff -r f4653576ffcd -r a3ca45d39fcf src/core/ngx_string.c<br>

> --- a/src/core/ngx_string.c     Thu Feb 23 07:56:44 2023 +0900<br>

> +++ b/src/core/ngx_string.c     Thu Feb 23 08:09:50 2023 +0900<br>

> @@ -1363,8 +1363,12 @@<br>

>      uint32_t  u, i, valid;<br>

> <br>

>      u = **p;<br>

> -<br>

> -    if (u >= 0xf0) {<br>

> +    if (u >= 0xf8) {<br>

> +<br>

> +        (*p)++;<br>

> +        return 0xffffffff;<br>

> +<br>

> +    } else if (u >= 0xf0) {<br>

> <br>

>          u &= 0x07;<br>

>          valid = 0xffff;<br>

<br>

Slightly adjusted the commit log to better explain the issue (and <br>

restored the accidentally removed empty line).  Please take a look <br>

if it seems good enough:<br>

<br>

# HG changeset patch<br>

# User Yugo Horie <<a href="mailto:u5.horie@gmail.com" target="_blank">u5.horie@gmail.com</a>><br>

# Date 1677107390 -32400<br>

#      Thu Feb 23 08:09:50 2023 +0900<br>

# Node ID a10210a45c8b6e6bb75e98b2fd64a80c184ae247<br>

# Parent  2acb00b9b5fff8a97523b659af4377fc605abe6e<br>

Core: stricter UTF-8 handling in ngx_utf8_decode().<br>

<br>

An UTF-8 octet sequence cannot start with a 11111xxx byte (above 0xf8),<br>

see <a href="https://datatracker.ietf.org/doc/html/rfc3629#section-3" rel="noreferrer" target="_blank">https://datatracker.ietf.org/doc/html/rfc3629#section-3</a>.  Previously,<br>

such bytes were accepted by ngx_utf8_decode() and misinterpreted as 11110xxx<br>

bytes (as in a 4-byte sequence).  While unlikely, this can potentially cause<br>

issues.<br>

<br>

Fix is to explicitly reject such bytes in ngx_utf8_decode().<br>

<br>

diff --git a/src/core/ngx_string.c b/src/core/ngx_string.c<br>

--- a/src/core/ngx_string.c<br>

+++ b/src/core/ngx_string.c<br>

@@ -1364,7 +1364,12 @@ ngx_utf8_decode(u_char **p, size_t n)<br>

<br>

     u = **p;<br>

<br>

-    if (u >= 0xf0) {<br>

+    if (u >= 0xf8) {<br>

+<br>

+        (*p)++;<br>

+        return 0xffffffff;<br>

+<br>

+    } else if (u >= 0xf0) {<br>

<br>

         u &= 0x07;<br>

         valid = 0xffff;<br>

<br>

<br>

-- <br>

Maxim Dounin<br>

<a href="http://mdounin.ru/" rel="noreferrer" target="_blank">http://mdounin.ru/</a><br>

_______________________________________________<br>

nginx-devel mailing list<br>

<a href="mailto:nginx-devel@nginx.org" target="_blank">nginx-devel@nginx.org</a><br>

<a href="https://mailman.nginx.org/mailman/listinfo/nginx-devel" rel="noreferrer" target="_blank">https://mailman.nginx.org/mailman/listinfo/nginx-devel</a><br>

</blockquote></div></div>