Core: Avoid memcpy from NULL
Ben Kallus
benjamin.p.kallus.gr at dartmouth.edu
Wed Jan 24 00:09:02 UTC 2024
Hi Maxim,
> As already pointed out previously, there are no known cases
> when memcpy(p, NULL, 0) can result in miscompilation of nginx
> code, ... If you think there are cases when the code can be
> miscompiled in practice, and not theoretically, please share.
There is no such thing as "miscompilation" of code that executes
undefined behavior. The behavior is undefined; literally any
instructions that the compiler emits is correct compilation. This is
the definition of undefined behavior.
You want me to cite a line in nginx that you would consider
"miscompiled in practice." I'm not going to spend hours combing
through assembly to convince you that undefined behavior is worth
avoiding. Sorry!
> as nginx usually does not checks string data pointers
> against NULL (and instead checks length, if needed). In
> particular, ngx_pstrdup() you are trying to patch doesn't. That
> is, this is exactly the "no direct impact" situation as assumed
> above.
It is non-obvious that checks against NULL will be optimized away
after calls to ngx_memcpy. Whether a function even calls ngx_memcpy on
a given pointer may not always be obvious, especially if that call
happens many layers down the stack.
The argument "but we just don't do checks against NULL after calling
memcpy" (much like the argument "but we just don't pass macro
arguments that would violate operator precedence or cause side effects
to be reevaluated") is a bad one because it requires all contributors
to follow a set of undocumented rules which, when broken, can have
serious consequences. Futher, the rules will change in the future as
new compiler optimizations are developed and enabled by default. For
now, NULL checks are to be avoided after calling memcpy on GCC -O2. In
the future, it's plausible that they may need to be avoided before
calling memcpy as well, and on clang -O1. Compiler authors do not all
care about the fact that you believe that the standard is wrong. Their
job is to make compliant programs run correctly and fast; they have no
similar obligation for programs that execute UB.
Most other C projects accept that avoiding all UB is necessary, even
when somewhat inconvenient. Search "memcpy null" in the commit history
of your C or C++ project of choice and you can see this for yourself.
Every project I tried (OpenSSL, Apache httpd, FFmpeg, curl, WebKit,
CPython) has at least 2 commits that work around memcpy from NULL,
usually by adding a length check.
As for what is the nicest way to avoid NULL memcpy, that is a matter
of taste. I personally think that it is needless to add an extra
branch to every memcpy, even when that memcpy's arguments are known to
be nonnull. I therefore advocate for a piecemeal approach, which also
seems to be the more common one. Patching ngx_memcpy, as you suggest,
is also a valid solution to the issue, and which is better is a matter
of opinion on which I don't have strong feelings.
-Ben
(
There is a proposal for C2y to define memcpy(NULL,NULL,0):
https://docs.google.com/document/d/1guH_HgibKrX7t9JfKGfWX2UCPyZOTLsnRfR6UleD1F8/edit
If you feel strongly that memcpy from NULL should be defined, feel
free to contribute to it :)
)
More information about the nginx-devel
mailing list