[njs] Fixed RegExp.prototype.exec() with global regexp and unicode input.

Dmitry Volyntsev xeioex at nginx.com
Wed Oct 18 00:52:30 UTC 2023


details:   https://hg.nginx.org/njs/rev/c0ff44d66ffb
branches:  
changeset: 2221:c0ff44d66ffb
user:      Dmitry Volyntsev <xeioex at nginx.com>
date:      Tue Oct 17 17:51:39 2023 -0700
description:
Fixed RegExp.prototype.exec() with global regexp and unicode input.

Previously, when exactly 32 characters unicode string was provided and
the "lastIndex" value of "this" regexp was equal to 32 too, the
njs_string_utf8_offset() was called with invalid index argument (longer
than a size of the string).  As a result njs_string_utf8_offset()
returned garbage values.

This was manifested in the following ways:
1) InternalError: pcre2_match() failed: bad offset value

2) Very slow replace calls with global regexps, for
   example in expressions like: str.replace(/<re>/g).

This fixes #677 on Github.

diffstat:

 src/njs_regexp.c         |  11 ++++++++---
 src/test/njs_unit_test.c |   6 ++++++
 2 files changed, 14 insertions(+), 3 deletions(-)

diffs (37 lines):

diff -r 714fae197d83 -r c0ff44d66ffb src/njs_regexp.c
--- a/src/njs_regexp.c	Mon Oct 16 18:09:37 2023 -0700
+++ b/src/njs_regexp.c	Tue Oct 17 17:51:39 2023 -0700
@@ -936,9 +936,14 @@ njs_regexp_builtin_exec(njs_vm_t *vm, nj
         offset = last_index;
 
     } else {
-        offset = njs_string_utf8_offset(string.start,
-                                        string.start + string.size, last_index)
-                 - string.start;
+        if ((size_t) last_index < string.length) {
+            offset = njs_string_utf8_offset(string.start,
+                                            string.start + string.size,
+                                            last_index)
+                     - string.start;
+        } else {
+            offset = string.size;
+        }
     }
 
     ret = njs_regexp_match(vm, &pattern->regex[type], string.start, offset,
diff -r 714fae197d83 -r c0ff44d66ffb src/test/njs_unit_test.c
--- a/src/test/njs_unit_test.c	Mon Oct 16 18:09:37 2023 -0700
+++ b/src/test/njs_unit_test.c	Tue Oct 17 17:51:39 2023 -0700
@@ -9261,6 +9261,12 @@ static njs_unit_test_t  njs_test[] =
     { njs_str("'abc'.replaceAll(/^/g, '|$&|')"),
       njs_str("||abc") },
 
+    { njs_str("('α'.repeat(30) + 'aa').replace(/a/g, '#')"),
+      njs_str("αααααααααααααααααααααααααααααα##") },
+
+    { njs_str("('α'.repeat(30) + 'aa').replaceAll(/a/g, '#')"),
+      njs_str("αααααααααααααααααααααααααααααα##") },
+
     { njs_str("var uri ='/u/v1/Aa/bB?type=m3u8&mt=42';"
               "uri.replace(/^\\/u\\/v1\\/[^/]*\\/([^\?]*)\\?.*(mt=[^&]*).*$/, '$1|$2')"),
       njs_str("bB|mt=42") },


More information about the nginx-devel mailing list