[PATCH] Removed the unsafe ngx_memcmp() wrapper for memcmp(3)

Alejandro Colomar alx.manpages at gmail.com
Sun Nov 6 22:50:38 UTC 2022

Hello Maxim!

On 11/5/22 03:39, Maxim Dounin wrote:
> The ngx_memcpy() macro was introduced in 107:b5be4b0448d3
> (nginx-0.0.1-2003-07-01-19:00:03 import) as an alias to memcmp(),
> with the following explanation why it's just an alias:
> +/* msvc and icc compile memcmp() to inline loop */
> +#define ngx_memcmp                memcmp
> This was along with the first use of memcmp in nginx, so clearly
> the "ngx_" prefix is not an accident.

Hmm, interesting point.

> Indeed, the prefix is used in multiple places to provide optimized
> or instrumented functions, even if normally they map to generally
> available functions such as memcmp().  For example, ngx_memcpy()
> usually maps to memcpy(), but instrumented to detect large copies
> if NGX_MEMCPY_LIMIT is defined.

Yeah, I can understand a wrapper over memcpy() and some others for those reasons.

These days of heavily optimized assembly specializations in libc, it's not so 
easy to outperform the libc implementation, but it might still be interesting to 
check when there are large bottlenecks.

memcmp(3) is likely to be less of a candidate for instrumenting and even less 
for optimization, but might be useful to have it around if its already there.  I 
would not write a wrapper if it didn't exist, but probably I wouldn't remove it 
now that it exists.  That makes some sense to me.

>  Even if the particular function
> is not instrumented, the prefix is important to make it possible
> to change things without huge modifications of the code, like the
> one in your patch.  Further, the prefix is also important for code
> consistency.

You may be interested in replacing these few that I just found, for added 

memcpy(&src_sockaddr.sockaddr_in.sin_addr, in->src_addr, 4);
memcpy(&dst_sockaddr.sockaddr_in.sin_addr, in->dst_addr, 4);
memcpy(&src_sockaddr.sockaddr_in6.sin6_addr, in6->src_addr, 16);
memcpy(&dst_sockaddr.sockaddr_in6.sin6_addr, in6->dst_addr, 16);

>  And, last but not least, it makes it possible to add
> casts if needed on some platforms
> (or remove them if they are
> considered wrong).
> In particular, casts to (const char *) for ngx_memcmp() were added
> in 2007 by this commit (nginx 0.6.18):
> changeset:   1648:89a47f19b9ec
> user:        Igor Sysoev <igor at sysoev.ru>
> date:        Fri Nov 23 17:00:11 2007 +0000
> summary:     update ngx_memcmp()
> Unfortunately, this commit contains no details about the
> particular system which needed the cast, though it looks like
> there were some at least at that time.

That date can make sense.  'void *' implementations were added after ANSI/ISO 
C89.  Systems designed before that year, would mostly have 'char *'.  Since some 
systems live up to 20+ years, by 2007 there would still be some pre-C89 systems 
around, which nginx had to support with the casts.

Now in 2022 it's been 33 years after ISO C89, so all systems that lack 'void *' 
are officially dead.  Nginx might still support some dead systems, though, so it 
merits a bit more investigation to confirm.

Since memcmp(3) comes implemented together with the other mem...(3) functions, 
checking those should help:

$ grepc ngx_memset
#define ngx_memset(buf, c, n)     (void) memset(buf, c, n)
$ grepc ngx_cpymem
#define ngx_cpymem(dst, src, n)   (((u_char *) ngx_memcpy(dst, src, n)) + (n))

#define ngx_cpymem(dst, src, n)   (((u_char *) memcpy(dst, src, n)) + (n))
$ grepc ngx_memmove
#define ngx_memmove(dst, src, n)   (void) memmove(dst, src, n)

These have no casts, so we can bet that if those work without a cast, it's 
because they are implemented in the system with 'void *'.  However, let's 
confirm that; after all, it could be that we only call them with types 
compatible with 'char *'.

`grep -rn ngx_memcpy.*data.*len` should find cases where ngx_str_t is being used 
with ngx_str_t.  It finds, among others, the following line:

ngx_memcpy(dst->server.data, src->server.data, src->server.len);

ngx_str_t uses the incompatible 'u_char *', so a 'char *' memcpy(3) wouldn't 
work without a cast, so memcpy(3) definitely must be implemented with 'void *', 
which implies that all mem...(3) functions, including memcmp(3), are, with a 
very strong certainty, implemented following ISO C prototypes (i.e., they use 
'void *').

Just to confirm, both 'dst' and 'src' are of type 
'ngx_stream_upstream_rr_peer_t', which is implemented as:

$ grepc ngx_stream_upstream_rr_peer_t
typedef struct ngx_stream_upstream_rr_peer_s   ngx_stream_upstream_rr_peer_t;

$ grepc ngx_stream_upstream_rr_peer_s
struct ngx_stream_upstream_rr_peer_s {
     ngx_str_t                        server;


So that 'server' is definitely a ngx_str_t, which confirms that there's u_char* 

> Similar casts are used by almost all nginx string macro
> definitions, since most C library functions accept "char *", while
> nginx uses "u_char *" for strings, and this results in useless
> warnings.

That's true of str...(3) functions from <string.h>, but the mem...(3) functions 
from <string.h> use 'void *', so conversions are automatic, and casts are 
unnecessary (after ANSI/ISO C89; previously, K&R C did use 'char *').

>  While it might be seen as something better to avoid,

Even if it's a bit off-topic, I'm very curious about the reason for using 
u_char.  It definitely requires a lot of extra work compared to 'char *': casts, 
type-safety, reviewing that code just works when workarounding/disabling the 
compiler warnings.  I'm guessing it was also some workaround for broken old 
implementations and it has just continued like that for consistency, but am 
curious if there are other better reasons.  Certainly, ASCII characters behave 
well (at least nowadays) independently of the signedness of char, and usually 
one doesn't do arithmetic with characters in strings.

> the only alternative would be to provide wrapper functions, which
> might not be a good idea for performance and readability reasons.

Yeah, GNU always_inline or C99 inline functions could be better than macros for 
performance, but those are not always available for systems supported by nginx, 
so macros are the way to go here.

> On the other hand, macro definitions with casts are easily
> readable, have no impact on performance, and provide limited to
> no effect on the code quality as long as proper coding, testing
> and review process is implemented.

Yes, they are the least evil, or even the only way, so that makes sense.

> Summing the above, certainly removing the "ngx_" prefix is a bad
> idea.

It can make sense, especially considering that nginx has always had it.

>  Removing casts from the particular macro might be
> considered, though in general it looks like a quest to prove these
> are not needed (or no longer needed) given the wide range of
> platforms nginx supports.

I don't know how far that support goes.  Maybe Windows XP?  Is that range even 

However, as far as the quest, I think the rationale above might be enough to 
prove it even without investigating specific systems.  It could certainly be 
possible that some system had a very brain-damaged <string.h> where memcmp(3) 
used a 'char *' parameter and all other mem...(3) functions used 'void *', but I 
don't think that's likely at all.

> Hope this helps.

For sure!  Thank you very much :)



-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20221106/64ca12ed/attachment.bin>

More information about the nginx-devel mailing list