[PATCH 0 of 4] Avoid dead store elimination in GCC 11+

Maxim Dounin mdounin at mdounin.ru
Tue Apr 18 19:14:24 UTC 2023


Hello!

On Tue, Apr 18, 2023 at 10:50:01AM +0100, Vadim Fedorenko wrote:

> On 18.04.2023 02:54, Maxim Dounin wrote:
> > Hello!
> > 
> > On Tue, Apr 18, 2023 at 02:07:06AM +0300, Vadim Fedorenko via nginx-devel wrote:
> > 
> >> GCC version 11 and newer use more aggressive way to eliminate dead stores
> >> which ends up removing ngx_memzero() calls in several places. Such optimization
> >> affects calculations of md5 and sha1 implemented internally in nginx. The
> >> effect could be easily observed by adding a random data to buffer array in
> >> md5_init() or sha1_init() functions. With this simple modifications the result
> >> of the hash computation will be different each time even though the provided
> >> data to hash is not changed.
> > 
> > If calculations of md5 and sha1 are affected, this means that the
> > stores in question are not dead, and they shouldn't be eliminated
> > in the first place.  From your description this looks like a bug
> > in the compiler in question.
> 
> Yeah, these ngx_memzero()s must not be dead, but according to the standart they
> are. In md5_final() the function is called this way: 
> ngx_memzero(&ctx->buffer[used], free - 8);
> That means that a new variable of type 'char *' is created with the life time
> scoped to the call to ngx_memzero(). As the result of of the function is ignored 
> explicitly, no other parameters are passed by pointer, and the variable is not 
> accessed anywhere else, the whole call can be optimized out.

The pointer is passed to the function, and the function modifies 
the memory being pointed to by the pointer.  While the pointer is 
not used anywhere else and can be optimized out, the memory it 
points to is used elsewhere, and this modification cannot be 
optimized out, so it is incorrect to remove the call according to 
my understanding of the C standard.

If you still think it's correct and based on the C standard, 
please provide relevant references (and quotes) which explain why 
these calls can be optimized out.

> > Alternatively, this can be a bug in nginx code which makes the
> > compiler think that it can eliminate these ngx_memzero() calls - for
> > example, GCC is known to do such things if it sees an undefined
> > behaviour in the code.
> 
> There is no undefined behavior unfortunately, everything in this place is well 
> defined.

Well, I don't think so.  There is a function call, and it cannot 
be eliminated by the compiler unless the compiler thinks that the 
results of the function call do not affect the program execution 
as externally observed.  Clearly the program execution is affected 
(as per your claim).  This leaves us the two possible 
alternatives:

- There is a bug in the compiler, and it incorrectly thinks that 
  the function call do not affect the program execution.

- There is a bug in the code, and it triggers undefined behaviour, 
  so the compiler might not actually know what happens in the code 
  (because it not required to do anything meaningful in case of 
  undefined behaviour, and simply assume it should never happen).

Just in case, the actual undefined behaviour might occur in the 
ngx_md5_body() function due to strict-aliasing rules being broken 
by the optimized GET() macro on platforms without strict alignment 
requirements if the original data buffer as provided to 
ngx_md5_update() cannot be aliased by uint32_t.  See this commit in 
the original repository of the md5 code nginx uses:

https://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/md5.c.diff?r1=1.14;r2=1.15

But nginx only uses ngx_md5_update() with text buffers, so 
strict-aliasing rules aren't broken.

> > You may want to elaborate more on how to reproduce this, and, if
> > possible, how to build a minimal test case which demonstrates the
> > problem.
> 
> Sure, let's elaborate a bit. To reproduce the bug you can simply apply the diff:
> 
> diff --git a/src/core/ngx_md5.c b/src/core/ngx_md5.c
> index c25d0025d..67cc06438 100644
> --- a/src/core/ngx_md5.c
> +++ b/src/core/ngx_md5.c
> @@ -24,6 +24,7 @@ ngx_md5_init(ngx_md5_t *ctx)
>       ctx->d = 0x10325476;
> 
>       ctx->bytes = 0;
> +    getrandom(ctx->buffer, 64, 0);
>   }
> 

Note that this won't compile, it also needs "#include <sys/random.h>".

> This code will emulate the garbage for the stack-allocated 'ngx_md5_t md5;' in 
> ngx_http_file_cache_create_key when nginx is running under the load. Then you 
> can use simple configuration:
> 
> upstream test_001_origin {
>    server 127.0.0.1:8000;
> }
> 
> proxy_cache_path /var/cache/nginx/test-001 keys_zone=test_001:10m max_size=5g 
> inactive=24h levels=1:2 use_temp_path=off;
> 
> server {
>    listen 127.0.0.1:8000;
> 
>    location = /plain {
>      return 200;
>    }
> 
> }
> 
> server {
>    listen 127.0.0.1:80;
> 
>    location /oplain {
>       proxy_cache test_001;
>       proxy_cache_key /oplain;
>       proxy_pass http://test_001_origin/plain/;
>    }
> }
> 
> 
> Every time you call 'curl http://127.0.0.1/oplain'  a new cache file will be 
> created, but the md5sum of the file will be the same, meaining that the key 
> stored in the file is absolutely the same.

Note that the exact configuration will make "GET /plain/" requests 
to upstream server, resulting in 404 and nothing cached.  I've 
fixed this to actually match "location = /plain" and added 
"proxy_cache_valid 200 1h;" to ensure caching will actually work.

Still, I wasn't able to reproduce the issue you are seeing on 
FreeBSD 12.4 with gcc12, neither with default compilation flags as 
used by nginx, nor with --with-cc-opt="-flto -O2" and 
--with-ld-opt="-flto -O2".

On RHEL 9 (Red Hat Enterprise Linux release 9.1 (Plow) from 
redhat/ubi9 image in Docker) with gcc11 (gcc version 11.3.1 
20220421 (Red Hat 11.3.1-2) (GCC), as installed with "yum install 
gcc") I wasn't able to reproduce this as well (also tested both 
with default compilation flags as provided by nginx, and cc/ld 
options "-flto -O2").

You may want to provide more details on how to reproduce this.  
Some exact steps you've actually tested might the way to go.

> >> Changing the code to use current implementation
> >> of ngx_explicit_memzero() doesn't help because of link-time optimizations
> >> enabled in RHEL 9 and derivatives. Glibc 2.34 found in RHEL 9 provides
> >> explicit_bzero() function which should be used to avoid such optimization.
> >> ngx_explicit_memzero() is changed to use explicit_bzero() if possible.
> > 
> > The ngx_explicit_memzero() function is to be used when zeroed data
> > are indeed not used afterwards, for example, to make sure
> > passwords are actually eliminated from memory.  It shouldn't be
> > used instead of a real ngx_memzero() call - doing so might hide
> > the problem, which is either in the compiler or in nginx, but
> > won't fix it.
> 
> In this case the nginx code should be fixed to avoid partial memory fillings, 
> but such change will come with performance penalty, especially on the CPUs 
> without proper `REP MOVSB/MOVSD/MOVSQ` implementation. Controlled usage of 
> explicit zeroing is much better is this case.

You may want to elaborate on what "nginx code should be fixed to 
avoid partial memory fillings" means and why it should be 
fixed/avoided.

> > As for using explicit_bzero() for it, we've looked into various
> > OS-specific solutions, though there are too many variants out
> > there, so it was decided that having our own simple implementation
> > is a better way to go.  If it doesn't work in the particular
> > setup, it might make sense to adjust our implementation - but
> > given the above, it might be the same issue which causes the
> > original problem.
> 
> Unfortunately, the memory barrier trick is not working anymore for linker-time 
> optimizations. Linker has better information about whether the stored 
> information is used again or not. And it will remove memset in such 
> implementation, and it will definitely affected security-related code you 
> mentioned above.

Without link-time optimization, just a separate compilation unit 
with a function is more than enough.

The ngx_memory_barrier() is additionally used as in many practical 
cases it introduces a compiler barrier, and this also defeats 
link-time optimization.  This might not be the case for GCC 
though, as with GCC we currently use __sync_synchronize() for 
ngx_memory_barrier().  Adding an explicit compiler barrier (asm 
with the "memory" clobber should work for most compilers, but not 
all) might be the way to go if it's indeed the case.

It does seem to work with GCC with link-time optimizations enabled 
though, as least in the RHEL 9 build with gcc11 "-flto -O2".  I'm 
seeing this in the disassemble of ngx_http_auth_basic_handler():

   0x00000000004709c6 <+918>:	rep stos %rax,%es:(%rdi)
   0x00000000004709c9 <+921>:	lock orq $0x0,(%rsp)

So it looks like ngx_explicit_memzero() is inlined and optimized 
to use "rep stos" instead of memset() call, but not eliminated.

> explicit_bzero() function is available in well-loved *BSD 
> systems now and is a proper way to do cleaning of the artifacts, doesn't matter 
> which implementation is used in the specific system.

If I recall correctly, when I last checked there were something 
like 5 different interfaces out there, including explicit_bzero(), 
explicit_memset(), memset_s(), and SecureZeroMemory().  With 
memset_s() being required by C11 standard, but with absolutely 
brain-damaged interface.  (It looks like now there is also 
memset_explicit(), which is going to become a standard in C23.)

As such, the decision was to use our own function which does the 
trick in most practical cases.  And if for some reason it doesn't, 
this isn't a big issue: that's a mitigation technique at most.

-- 
Maxim Dounin
http://mdounin.ru/


More information about the nginx-devel mailing list