[PATCH 2 of 4] Win32: handling of localized MSVC cl output
Maxim Dounin
mdounin at mdounin.ru
Fri Feb 10 19:15:08 UTC 2023
Hello!
On Fri, Feb 10, 2023 at 03:21:05PM +0400, Sergey Kandaurov wrote:
>
> > On 20 Dec 2022, at 17:30, Maxim Dounin <mdounin at mdounin.ru> wrote:
> >
> > # HG changeset patch
> > # User Maxim Dounin <mdounin at mdounin.ru>
> > # Date 1671541078 -10800
> > # Tue Dec 20 15:57:58 2022 +0300
> > # Node ID 43098cb134a87a404b70fcc77ad01ca343cba969
> > # Parent f5d9c24fb4ac2a6b82b9d842b88978a329690138
> > Win32: handling of localized MSVC cl output.
> >
> > Output examples in English, Russian, and Spanish:
> >
> > Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80x86
> > Оптимизирующий 32-разрядный компилятор Microsoft (R) C/C++ версии 16.00.30319.01 для 80x86
> > Compilador de optimización de C/C++ de Microsoft (R) versión 16.00.30319.01 para x64
> >
>
> The transaction to import this change with mercurial on a non-localized
> win2003 aborts with the error:
>
> abort: decoding near '....': 'charmap' codec can't decode byte 0x8f in position 218: character maps to <undefined>!
>
> The position matches UTF-8 U+044F (0xd18f), Cyrillic small letter ya:
> $ hg ex --template {desc} | hexdump -C | grep `printf "%08x\n" $((218/16*16))`
> 000000d0 2d d1 80 d0 b0 d0 b7 d1 80 d1 8f d0 b4 d0 bd d1 |-?.аз?.?.дн?|
>
> Although the error can be suppressed using HGENCODING, "hg log"
> produces garbled output in place of Cyrillic and umlaut symbols.
>
> The safest solution can be to mangle such localized examples in ascii,
> to allow the change to apply and still have a sketchy knowledge how cl
> output can be different. Another way is to skip the examples.
I don't think that mangling examples is a good idea - these are
provided specifically to make it possible to review and test the
code.
On the other hand, it's quite normal that importing a patch with
UTF-8 characters on a system not configured to use UTF-8 fails.
Just pulling from another repo will work fine though. That is,
this only affects developers specifically involved in the patch
development.
Further, it is quite normal that displaying such a patch produces
garbled output on a system without UTF-8 console support. On
systems with Cyrillic and/or UTF-8 console support it will be
displayed correctly (provided that HGENCODING is set or properly
autodetected).
Overall, I tend to think that using UTF-8 in commit logs, at least
in various quotes, shouldn't be a problem. Note well that we
already have UTF-8 in various commit logs and author names.
>
> > Since most of the words are translated, instead of looking for the words
> > "Compiler Version" we now search for "C/C++" and the version number.
> >
> > diff -r f5d9c24fb4ac -r 43098cb134a8 auto/cc/msvc
> > --- a/auto/cc/msvc Tue Dec 20 15:57:51 2022 +0300
> > +++ b/auto/cc/msvc Tue Dec 20 15:57:58 2022 +0300
> > @@ -11,8 +11,8 @@
> > # MSVC 2015 (14.0) cl 19.00
> >
> >
> > -NGX_MSVC_VER=`$NGX_WINE $CC 2>&1 | grep 'Compiler Version' 2>&1 \
> > - | sed -e 's/^.* Version \(.*\)/\1/'`
> > +NGX_MSVC_VER=`$NGX_WINE $CC 2>&1 | grep 'C/C++.* [0-9][0-9]*\.[0-9]' 2>&1 \
> > + | sed -e 's/^.* \([0-9][0-9]*\.[0-9].*\)/\1/'`
> >
> > echo " + cl version: $NGX_MSVC_VER"
> >
>
> I recall there were discussions whether we can avoid using the grep command,
> or if it should search other words for better matching.
> Personally, I think the proposed change is good enough.
The change which was previously discussed was slightly different
(and grep wasn't adding anything there). In this change, usage
of grep is perfectly in line with other uses (and selects just one
line, which is then modified by sed).
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx-devel
mailing list