[PATCH 0 of 4] Avoid dead store elimination in GCC 11+

Vadim Fedorenko vadim.fedorenko at cdnnow.ru
Wed Apr 19 08:55:33 UTC 2023


Hi!

On 18.04.2023 20:14, Maxim Dounin wrote:
> Hello!
> 
> On Tue, Apr 18, 2023 at 10:50:01AM +0100, Vadim Fedorenko wrote:
> 
>> On 18.04.2023 02:54, Maxim Dounin wrote:
>>> Hello!
>>>
>>> On Tue, Apr 18, 2023 at 02:07:06AM +0300, Vadim Fedorenko via nginx-devel wrote:
>>>
>>>> GCC version 11 and newer use more aggressive way to eliminate dead stores
>>>> which ends up removing ngx_memzero() calls in several places. Such optimization
>>>> affects calculations of md5 and sha1 implemented internally in nginx. The
>>>> effect could be easily observed by adding a random data to buffer array in
>>>> md5_init() or sha1_init() functions. With this simple modifications the result
>>>> of the hash computation will be different each time even though the provided
>>>> data to hash is not changed.
>>>
>>> If calculations of md5 and sha1 are affected, this means that the
>>> stores in question are not dead, and they shouldn't be eliminated
>>> in the first place.  From your description this looks like a bug
>>> in the compiler in question.
>>
>> Yeah, these ngx_memzero()s must not be dead, but according to the standart they
>> are. In md5_final() the function is called this way:
>> ngx_memzero(&ctx->buffer[used], free - 8);
>> That means that a new variable of type 'char *' is created with the life time
>> scoped to the call to ngx_memzero(). As the result of of the function is ignored
>> explicitly, no other parameters are passed by pointer, and the variable is not
>> accessed anywhere else, the whole call can be optimized out.
> 
> The pointer is passed to the function, and the function modifies
> the memory being pointed to by the pointer.  While the pointer is
> not used anywhere else and can be optimized out, the memory it
> points to is used elsewhere, and this modification cannot be
> optimized out, so it is incorrect to remove the call according to
> my understanding of the C standard.
> 
> If you still think it's correct and based on the C standard,
> please provide relevant references (and quotes) which explain why
> these calls can be optimized out.
> 
>>> Alternatively, this can be a bug in nginx code which makes the
>>> compiler think that it can eliminate these ngx_memzero() calls - for
>>> example, GCC is known to do such things if it sees an undefined
>>> behaviour in the code.
>>
>> There is no undefined behavior unfortunately, everything in this place is well
>> defined.
> 
> Well, I don't think so.  There is a function call, and it cannot
> be eliminated by the compiler unless the compiler thinks that the
> results of the function call do not affect the program execution
> as externally observed.  Clearly the program execution is affected
> (as per your claim).  This leaves us the two possible
> alternatives:
> 
> - There is a bug in the compiler, and it incorrectly thinks that
>    the function call do not affect the program execution.
> 
> - There is a bug in the code, and it triggers undefined behaviour,
>    so the compiler might not actually know what happens in the code
>    (because it not required to do anything meaningful in case of
>    undefined behaviour, and simply assume it should never happen).
> 
> Just in case, the actual undefined behaviour might occur in the
> ngx_md5_body() function due to strict-aliasing rules being broken
> by the optimized GET() macro on platforms without strict alignment
> requirements if the original data buffer as provided to
> ngx_md5_update() cannot be aliased by uint32_t.  See this commit in
> the original repository of the md5 code nginx uses:
> 
> https://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/md5.c.diff?r1=1.14;r2=1.15
> 
> But nginx only uses ngx_md5_update() with text buffers, so
> strict-aliasing rules aren't broken.
> 
>>> You may want to elaborate more on how to reproduce this, and, if
>>> possible, how to build a minimal test case which demonstrates the
>>> problem.
>>
>> Sure, let's elaborate a bit. To reproduce the bug you can simply apply the diff:
>>
>> diff --git a/src/core/ngx_md5.c b/src/core/ngx_md5.c
>> index c25d0025d..67cc06438 100644
>> --- a/src/core/ngx_md5.c
>> +++ b/src/core/ngx_md5.c
>> @@ -24,6 +24,7 @@ ngx_md5_init(ngx_md5_t *ctx)
>>        ctx->d = 0x10325476;
>>
>>        ctx->bytes = 0;
>> +    getrandom(ctx->buffer, 64, 0);
>>    }
>>
> 
> Note that this won't compile, it also needs "#include <sys/random.h>".
> 
>> This code will emulate the garbage for the stack-allocated 'ngx_md5_t md5;' in
>> ngx_http_file_cache_create_key when nginx is running under the load. Then you
>> can use simple configuration:
>>
>> upstream test_001_origin {
>>     server 127.0.0.1:8000;
>> }
>>
>> proxy_cache_path /var/cache/nginx/test-001 keys_zone=test_001:10m max_size=5g
>> inactive=24h levels=1:2 use_temp_path=off;
>>
>> server {
>>     listen 127.0.0.1:8000;
>>
>>     location = /plain {
>>       return 200;
>>     }
>>
>> }
>>
>> server {
>>     listen 127.0.0.1:80;
>>
>>     location /oplain {
>>        proxy_cache test_001;
>>        proxy_cache_key /oplain;
>>        proxy_pass http://test_001_origin/plain/;
>>     }
>> }
>>
>>
>> Every time you call 'curl http://127.0.0.1/oplain'  a new cache file will be
>> created, but the md5sum of the file will be the same, meaining that the key
>> stored in the file is absolutely the same.
> 
> Note that the exact configuration will make "GET /plain/" requests
> to upstream server, resulting in 404 and nothing cached.  I've
> fixed this to actually match "location = /plain" and added
> "proxy_cache_valid 200 1h;" to ensure caching will actually work.
> 
> Still, I wasn't able to reproduce the issue you are seeing on
> FreeBSD 12.4 with gcc12, neither with default compilation flags as
> used by nginx, nor with --with-cc-opt="-flto -O2" and
> --with-ld-opt="-flto -O2".
> 
> On RHEL 9 (Red Hat Enterprise Linux release 9.1 (Plow) from
> redhat/ubi9 image in Docker) with gcc11 (gcc version 11.3.1
> 20220421 (Red Hat 11.3.1-2) (GCC), as installed with "yum install
> gcc") I wasn't able to reproduce this as well (also tested both
> with default compilation flags as provided by nginx, and cc/ld
> options "-flto -O2").
> 
> You may want to provide more details on how to reproduce this.
> Some exact steps you've actually tested might the way to go.
> 
>>>> Changing the code to use current implementation
>>>> of ngx_explicit_memzero() doesn't help because of link-time optimizations
>>>> enabled in RHEL 9 and derivatives. Glibc 2.34 found in RHEL 9 provides
>>>> explicit_bzero() function which should be used to avoid such optimization.
>>>> ngx_explicit_memzero() is changed to use explicit_bzero() if possible.
>>>
>>> The ngx_explicit_memzero() function is to be used when zeroed data
>>> are indeed not used afterwards, for example, to make sure
>>> passwords are actually eliminated from memory.  It shouldn't be
>>> used instead of a real ngx_memzero() call - doing so might hide
>>> the problem, which is either in the compiler or in nginx, but
>>> won't fix it.
>>
>> In this case the nginx code should be fixed to avoid partial memory fillings,
>> but such change will come with performance penalty, especially on the CPUs
>> without proper `REP MOVSB/MOVSD/MOVSQ` implementation. Controlled usage of
>> explicit zeroing is much better is this case.
> 
> You may want to elaborate on what "nginx code should be fixed to
> avoid partial memory fillings" means and why it should be
> fixed/avoided.
> 
>>> As for using explicit_bzero() for it, we've looked into various
>>> OS-specific solutions, though there are too many variants out
>>> there, so it was decided that having our own simple implementation
>>> is a better way to go.  If it doesn't work in the particular
>>> setup, it might make sense to adjust our implementation - but
>>> given the above, it might be the same issue which causes the
>>> original problem.
>>
>> Unfortunately, the memory barrier trick is not working anymore for linker-time
>> optimizations. Linker has better information about whether the stored
>> information is used again or not. And it will remove memset in such
>> implementation, and it will definitely affected security-related code you
>> mentioned above.
> 
> Without link-time optimization, just a separate compilation unit
> with a function is more than enough.
> 
> The ngx_memory_barrier() is additionally used as in many practical
> cases it introduces a compiler barrier, and this also defeats
> link-time optimization.  This might not be the case for GCC
> though, as with GCC we currently use __sync_synchronize() for
> ngx_memory_barrier().  Adding an explicit compiler barrier (asm
> with the "memory" clobber should work for most compilers, but not
> all) might be the way to go if it's indeed the case.
> 
> It does seem to work with GCC with link-time optimizations enabled
> though, as least in the RHEL 9 build with gcc11 "-flto -O2".  I'm
> seeing this in the disassemble of ngx_http_auth_basic_handler():
> 
>     0x00000000004709c6 <+918>:	rep stos %rax,%es:(%rdi)
>     0x00000000004709c9 <+921>:	lock orq $0x0,(%rsp)
> 
> So it looks like ngx_explicit_memzero() is inlined and optimized
> to use "rep stos" instead of memset() call, but not eliminated.
> 
>> explicit_bzero() function is available in well-loved *BSD
>> systems now and is a proper way to do cleaning of the artifacts, doesn't matter
>> which implementation is used in the specific system.
> 
> If I recall correctly, when I last checked there were something
> like 5 different interfaces out there, including explicit_bzero(),
> explicit_memset(), memset_s(), and SecureZeroMemory().  With
> memset_s() being required by C11 standard, but with absolutely
> brain-damaged interface.  (It looks like now there is also
> memset_explicit(), which is going to become a standard in C23.)
> 
> As such, the decision was to use our own function which does the
> trick in most practical cases.  And if for some reason it doesn't,
> this isn't a big issue: that's a mitigation technique at most.
> 

Looks like I found the root cause of the issue in the code added on top of nginx
implementation of md5, which is using “type-punning” for optimization reasons,
but ended with UB when NGX_HAVE_LITTLE_ENDIAN && NGX_HAVE_NONALIGNED are
defined. Sorry for the noise and many thanks to Alejandro for help, hint with
-fstrict-aliasing reminded me this very old change in the code.

All best,
Vadim


More information about the nginx-devel mailing list