[PATCH] Core: return error when the first byte is above 0xf5 in utf-8
u5h
u5.horie at gmail.com
Thu Mar 2 08:17:05 UTC 2023
Hi, sorry for bothering you.
It looks good to me. Thanks!
—
Yugo Horie
On Thu, Mar 2, 2023 at 8:51 Maxim Dounin <mdounin at mdounin.ru> wrote:
> Hello!
>
> On Thu, Feb 23, 2023 at 09:24:52AM +0900, u5h wrote:
>
> > Thanks reviewing!
> >
> > I agree with your early return strategy and I would reconsider that
> > condition below.
> >
> > # HG changeset patch
> > # User Yugo Horie <u5.horie at gmail.com>
> > # Date 1677107390 -32400
> > # Thu Feb 23 08:09:50 2023 +0900
> > # Node ID a3ca45d39fcfd32ca92a6bd25ec18b6359b90f1a
> > # Parent f4653576ffcd286bed7229e18ee30ec3c713b4de
> > Core: restrict the rule of utf-8 decode.
> >
> > The first byte being above 0xf8 which is referred to 5byte
> > over length older utf-8 becomes invalid.
> > Even the range of the first byte from 0xf5 to
> > 0xf7 is valid in the term of the codepoint decoding.
> > See https://datatracker.ietf.org/doc/html/rfc3629#section-4.
> >
> > diff -r f4653576ffcd -r a3ca45d39fcf src/core/ngx_string.c
> > --- a/src/core/ngx_string.c Thu Feb 23 07:56:44 2023 +0900
> > +++ b/src/core/ngx_string.c Thu Feb 23 08:09:50 2023 +0900
> > @@ -1363,8 +1363,12 @@
> > uint32_t u, i, valid;
> >
> > u = **p;
> > -
> > - if (u >= 0xf0) {
> > + if (u >= 0xf8) {
> > +
> > + (*p)++;
> > + return 0xffffffff;
> > +
> > + } else if (u >= 0xf0) {
> >
> > u &= 0x07;
> > valid = 0xffff;
>
> Slightly adjusted the commit log to better explain the issue (and
> restored the accidentally removed empty line). Please take a look
> if it seems good enough:
>
> # HG changeset patch
> # User Yugo Horie <u5.horie at gmail.com>
> # Date 1677107390 -32400
> # Thu Feb 23 08:09:50 2023 +0900
> # Node ID a10210a45c8b6e6bb75e98b2fd64a80c184ae247
> # Parent 2acb00b9b5fff8a97523b659af4377fc605abe6e
> Core: stricter UTF-8 handling in ngx_utf8_decode().
>
> An UTF-8 octet sequence cannot start with a 11111xxx byte (above 0xf8),
> see https://datatracker.ietf.org/doc/html/rfc3629#section-3. Previously,
> such bytes were accepted by ngx_utf8_decode() and misinterpreted as
> 11110xxx
> bytes (as in a 4-byte sequence). While unlikely, this can potentially
> cause
> issues.
>
> Fix is to explicitly reject such bytes in ngx_utf8_decode().
>
> diff --git a/src/core/ngx_string.c b/src/core/ngx_string.c
> --- a/src/core/ngx_string.c
> +++ b/src/core/ngx_string.c
> @@ -1364,7 +1364,12 @@ ngx_utf8_decode(u_char **p, size_t n)
>
> u = **p;
>
> - if (u >= 0xf0) {
> + if (u >= 0xf8) {
> +
> + (*p)++;
> + return 0xffffffff;
> +
> + } else if (u >= 0xf0) {
>
> u &= 0x07;
> valid = 0xffff;
>
>
> --
> Maxim Dounin
> http://mdounin.ru/
> _______________________________________________
> nginx-devel mailing list
> nginx-devel at nginx.org
> https://mailman.nginx.org/mailman/listinfo/nginx-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20230302/20e91dcb/attachment.htm>
More information about the nginx-devel
mailing list