[PATCH 0 of 4] Avoid dead store elimination in GCC 11+
Vadim Fedorenko
vadim.fedorenko at cdnnow.ru
Wed Apr 19 08:55:33 UTC 2023
Hi!
On 18.04.2023 20:14, Maxim Dounin wrote:
> Hello!
>
> On Tue, Apr 18, 2023 at 10:50:01AM +0100, Vadim Fedorenko wrote:
>
>> On 18.04.2023 02:54, Maxim Dounin wrote:
>>> Hello!
>>>
>>> On Tue, Apr 18, 2023 at 02:07:06AM +0300, Vadim Fedorenko via nginx-devel wrote:
>>>
>>>> GCC version 11 and newer use more aggressive way to eliminate dead stores
>>>> which ends up removing ngx_memzero() calls in several places. Such optimization
>>>> affects calculations of md5 and sha1 implemented internally in nginx. The
>>>> effect could be easily observed by adding a random data to buffer array in
>>>> md5_init() or sha1_init() functions. With this simple modifications the result
>>>> of the hash computation will be different each time even though the provided
>>>> data to hash is not changed.
>>>
>>> If calculations of md5 and sha1 are affected, this means that the
>>> stores in question are not dead, and they shouldn't be eliminated
>>> in the first place. From your description this looks like a bug
>>> in the compiler in question.
>>
>> Yeah, these ngx_memzero()s must not be dead, but according to the standart they
>> are. In md5_final() the function is called this way:
>> ngx_memzero(&ctx->buffer[used], free - 8);
>> That means that a new variable of type 'char *' is created with the life time
>> scoped to the call to ngx_memzero(). As the result of of the function is ignored
>> explicitly, no other parameters are passed by pointer, and the variable is not
>> accessed anywhere else, the whole call can be optimized out.
>
> The pointer is passed to the function, and the function modifies
> the memory being pointed to by the pointer. While the pointer is
> not used anywhere else and can be optimized out, the memory it
> points to is used elsewhere, and this modification cannot be
> optimized out, so it is incorrect to remove the call according to
> my understanding of the C standard.
>
> If you still think it's correct and based on the C standard,
> please provide relevant references (and quotes) which explain why
> these calls can be optimized out.
>
>>> Alternatively, this can be a bug in nginx code which makes the
>>> compiler think that it can eliminate these ngx_memzero() calls - for
>>> example, GCC is known to do such things if it sees an undefined
>>> behaviour in the code.
>>
>> There is no undefined behavior unfortunately, everything in this place is well
>> defined.
>
> Well, I don't think so. There is a function call, and it cannot
> be eliminated by the compiler unless the compiler thinks that the
> results of the function call do not affect the program execution
> as externally observed. Clearly the program execution is affected
> (as per your claim). This leaves us the two possible
> alternatives:
>
> - There is a bug in the compiler, and it incorrectly thinks that
> the function call do not affect the program execution.
>
> - There is a bug in the code, and it triggers undefined behaviour,
> so the compiler might not actually know what happens in the code
> (because it not required to do anything meaningful in case of
> undefined behaviour, and simply assume it should never happen).
>
> Just in case, the actual undefined behaviour might occur in the
> ngx_md5_body() function due to strict-aliasing rules being broken
> by the optimized GET() macro on platforms without strict alignment
> requirements if the original data buffer as provided to
> ngx_md5_update() cannot be aliased by uint32_t. See this commit in
> the original repository of the md5 code nginx uses:
>
> https://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/md5.c.diff?r1=1.14;r2=1.15
>
> But nginx only uses ngx_md5_update() with text buffers, so
> strict-aliasing rules aren't broken.
>
>>> You may want to elaborate more on how to reproduce this, and, if
>>> possible, how to build a minimal test case which demonstrates the
>>> problem.
>>
>> Sure, let's elaborate a bit. To reproduce the bug you can simply apply the diff:
>>
>> diff --git a/src/core/ngx_md5.c b/src/core/ngx_md5.c
>> index c25d0025d..67cc06438 100644
>> --- a/src/core/ngx_md5.c
>> +++ b/src/core/ngx_md5.c
>> @@ -24,6 +24,7 @@ ngx_md5_init(ngx_md5_t *ctx)
>> ctx->d = 0x10325476;
>>
>> ctx->bytes = 0;
>> + getrandom(ctx->buffer, 64, 0);
>> }
>>
>
> Note that this won't compile, it also needs "#include <sys/random.h>".
>
>> This code will emulate the garbage for the stack-allocated 'ngx_md5_t md5;' in
>> ngx_http_file_cache_create_key when nginx is running under the load. Then you
>> can use simple configuration:
>>
>> upstream test_001_origin {
>> server 127.0.0.1:8000;
>> }
>>
>> proxy_cache_path /var/cache/nginx/test-001 keys_zone=test_001:10m max_size=5g
>> inactive=24h levels=1:2 use_temp_path=off;
>>
>> server {
>> listen 127.0.0.1:8000;
>>
>> location = /plain {
>> return 200;
>> }
>>
>> }
>>
>> server {
>> listen 127.0.0.1:80;
>>
>> location /oplain {
>> proxy_cache test_001;
>> proxy_cache_key /oplain;
>> proxy_pass http://test_001_origin/plain/;
>> }
>> }
>>
>>
>> Every time you call 'curl http://127.0.0.1/oplain' a new cache file will be
>> created, but the md5sum of the file will be the same, meaining that the key
>> stored in the file is absolutely the same.
>
> Note that the exact configuration will make "GET /plain/" requests
> to upstream server, resulting in 404 and nothing cached. I've
> fixed this to actually match "location = /plain" and added
> "proxy_cache_valid 200 1h;" to ensure caching will actually work.
>
> Still, I wasn't able to reproduce the issue you are seeing on
> FreeBSD 12.4 with gcc12, neither with default compilation flags as
> used by nginx, nor with --with-cc-opt="-flto -O2" and
> --with-ld-opt="-flto -O2".
>
> On RHEL 9 (Red Hat Enterprise Linux release 9.1 (Plow) from
> redhat/ubi9 image in Docker) with gcc11 (gcc version 11.3.1
> 20220421 (Red Hat 11.3.1-2) (GCC), as installed with "yum install
> gcc") I wasn't able to reproduce this as well (also tested both
> with default compilation flags as provided by nginx, and cc/ld
> options "-flto -O2").
>
> You may want to provide more details on how to reproduce this.
> Some exact steps you've actually tested might the way to go.
>
>>>> Changing the code to use current implementation
>>>> of ngx_explicit_memzero() doesn't help because of link-time optimizations
>>>> enabled in RHEL 9 and derivatives. Glibc 2.34 found in RHEL 9 provides
>>>> explicit_bzero() function which should be used to avoid such optimization.
>>>> ngx_explicit_memzero() is changed to use explicit_bzero() if possible.
>>>
>>> The ngx_explicit_memzero() function is to be used when zeroed data
>>> are indeed not used afterwards, for example, to make sure
>>> passwords are actually eliminated from memory. It shouldn't be
>>> used instead of a real ngx_memzero() call - doing so might hide
>>> the problem, which is either in the compiler or in nginx, but
>>> won't fix it.
>>
>> In this case the nginx code should be fixed to avoid partial memory fillings,
>> but such change will come with performance penalty, especially on the CPUs
>> without proper `REP MOVSB/MOVSD/MOVSQ` implementation. Controlled usage of
>> explicit zeroing is much better is this case.
>
> You may want to elaborate on what "nginx code should be fixed to
> avoid partial memory fillings" means and why it should be
> fixed/avoided.
>
>>> As for using explicit_bzero() for it, we've looked into various
>>> OS-specific solutions, though there are too many variants out
>>> there, so it was decided that having our own simple implementation
>>> is a better way to go. If it doesn't work in the particular
>>> setup, it might make sense to adjust our implementation - but
>>> given the above, it might be the same issue which causes the
>>> original problem.
>>
>> Unfortunately, the memory barrier trick is not working anymore for linker-time
>> optimizations. Linker has better information about whether the stored
>> information is used again or not. And it will remove memset in such
>> implementation, and it will definitely affected security-related code you
>> mentioned above.
>
> Without link-time optimization, just a separate compilation unit
> with a function is more than enough.
>
> The ngx_memory_barrier() is additionally used as in many practical
> cases it introduces a compiler barrier, and this also defeats
> link-time optimization. This might not be the case for GCC
> though, as with GCC we currently use __sync_synchronize() for
> ngx_memory_barrier(). Adding an explicit compiler barrier (asm
> with the "memory" clobber should work for most compilers, but not
> all) might be the way to go if it's indeed the case.
>
> It does seem to work with GCC with link-time optimizations enabled
> though, as least in the RHEL 9 build with gcc11 "-flto -O2". I'm
> seeing this in the disassemble of ngx_http_auth_basic_handler():
>
> 0x00000000004709c6 <+918>: rep stos %rax,%es:(%rdi)
> 0x00000000004709c9 <+921>: lock orq $0x0,(%rsp)
>
> So it looks like ngx_explicit_memzero() is inlined and optimized
> to use "rep stos" instead of memset() call, but not eliminated.
>
>> explicit_bzero() function is available in well-loved *BSD
>> systems now and is a proper way to do cleaning of the artifacts, doesn't matter
>> which implementation is used in the specific system.
>
> If I recall correctly, when I last checked there were something
> like 5 different interfaces out there, including explicit_bzero(),
> explicit_memset(), memset_s(), and SecureZeroMemory(). With
> memset_s() being required by C11 standard, but with absolutely
> brain-damaged interface. (It looks like now there is also
> memset_explicit(), which is going to become a standard in C23.)
>
> As such, the decision was to use our own function which does the
> trick in most practical cases. And if for some reason it doesn't,
> this isn't a big issue: that's a mitigation technique at most.
>
Looks like I found the root cause of the issue in the code added on top of nginx
implementation of md5, which is using “type-punning” for optimization reasons,
but ended with UB when NGX_HAVE_LITTLE_ENDIAN && NGX_HAVE_NONALIGNED are
defined. Sorry for the noise and many thanks to Alejandro for help, hint with
-fstrict-aliasing reminded me this very old change in the code.
All best,
Vadim
More information about the nginx-devel
mailing list