[PATCH] Win32: PCRE2 Unicode support with MSVC

Maxim Dounin mdounin at mdounin.ru
Tue Mar 21 19:49:53 UTC 2023


Hello!

On Tue, Mar 21, 2023 at 01:50:58PM +0400, Sergey Kandaurov wrote:

> 
> > On 21 Mar 2023, at 03:55, Maxim Dounin <mdounin at mdounin.ru> wrote:
> > 
> > Hello!
> > 
> > On Mon, Mar 20, 2023 at 06:58:32PM +0400, Sergey Kandaurov wrote:
> > 
> >> # HG changeset patch
> >> # User Sergey Kandaurov <pluknet at nginx.com>
> >> # Date 1679324252 -14400
> >> #      Mon Mar 20 18:57:32 2023 +0400
> >> # Node ID d0b013a7050e00613804b399ae2ca74551b2a071
> >> # Parent  8771d35d55d0a2b1cefaab04401d6f837f5a05a2
> >> Win32: PCRE2 Unicode support with MSVC.
> >> 
> >> Unicode support in PCRE2 is enabled by default on configure/cmake side
> >> by defining SUPPORT_UNICODE.  Previously, this macro was not defined
> >> when compiling directly PCRE2 sources for Windows with MSVC.
> >> 
> >> In particular, this change allows to specify Unicode properties, such as
> >> \P, \p, or \X, as caught by http_server_name.t adjusted to run on Windows:
> >> 
> >> nginx: [emerg] pcre2_compile() failed: this version of PCRE2 does not have
> >> support for \P, \p, or \X
> >> 
> >> diff --git a/auto/lib/pcre/make b/auto/lib/pcre/make
> >> --- a/auto/lib/pcre/make
> >> +++ b/auto/lib/pcre/make
> >> @@ -61,7 +61,7 @@ if [ $PCRE_LIBRARY = PCRE2 ]; then
> >> 
> >> PCRE_CFLAGS =	-O2 -Ob1 -Oi -Gs $LIBC $CPU_OPT
> >> PCRE_FLAGS =	-DHAVE_CONFIG_H -DPCRE2_STATIC -DPCRE2_CODE_UNIT_WIDTH=8 \\
> >> -		-DHAVE_MEMMOVE
> >> +		-DHAVE_MEMMOVE -DSUPPORT_UNICODE
> >> 
> >> PCRE_SRCS =	 $ngx_pcre_srcs
> >> PCRE_OBJS =	 $ngx_pcre_objs
> > 
> > The PCRE2 compilation in auto/lib/pcre/make mostly matches PCRE 
> > compilation in auto/lib/pcre/makefile.msvc, and it never tried to 
> > enable Unicode / UTF-8 support.  This in turn matches PCRE 
> > configure behaviour: UTF-8 support is disabled by default and 
> > needs to be explicitly enabled.
> 
> This looks unrelated, because PCRE(1) is a different library.

Well, not really.  Even if we consider these to be different 
libraries, both PCRE and PCRE2 are libraries to implement regular 
expression matching as used by nginx, and it looks logical that 
nginx compiles them with similar / identical feature sets when 
asked to compile.

> > While we might consider enabling Unicode support for PCRE2, since 
> > it is now enabled by default in PCRE2 (or for both PCRE and PCRE2, 
> > since it is something usually expected to work nowadays), for 
> > tests a better solution might be to don't rely on this.  Unicode / 
> > UTF-8 support might not be available on various other platforms as 
> > well, so it's generally might be a good idea to adjust tests to 
> > tolerate PCRE/PCRE2 compiled without Unicode / UTF-8 support.
> 
> As explained in the commit log, PCRE2 has Unicode support by default,
> as configured by configure or cmake, which makes it depend on how
> nginx for Windows was built:
> - PCRE2 Unicode support is present if built with GCC MinGW
> - it is not if built manually (missing configuration) with MSVC
> 
> Such inconsistency doesn't look related to "various other platforms".

Similarly, when nginx on Windows is built with PCRE instead of 
PCRE2, Unicode support won't be available.  Further, the same 
applies to PCRE on Unix as long as it is built by nginx (and UTF-8 
support is not explicitly enabled with "--with-pcre-opt=...").  
Further, there can be other platforms where PCRE (or even PCRE2) 
is built without Unicode support.

-- 
Maxim Dounin
http://mdounin.ru/


More information about the nginx-devel mailing list