inline

Wed Jun 1 23:49:01 UTC 2022

Hi,

We define `nxt_inline` as:

$ grepc nxt_inline
./src/nxt_clang.h:11:
#define nxt_inline     static inline __attribute__((always_inline))

I'm concerned that we may be inlining too much.

See for example the following function calls:

$ grep -rnA1 nxt_slow_path | grep -B1 nxt_thread_get_tid
src/nxt_thread.c:247:    if (nxt_slow_path(thr->tid == 0)) {
src/nxt_thread.c-248-        thr->tid = nxt_thread_get_tid();
--
src/nxt_thread.c:257:        if (nxt_slow_path(thr->tid == 0)) {
src/nxt_thread.c-258-            thr->tid = nxt_thread_get_tid();

Looks good, but then see the definition of nxt_thread_get_tid() (it has 
several difinitions, one for each system, but I'll copy here one that is 
pretty big (NXT_AIX)):

$ grepc nxt_thread_get_tid
[...]

./src/nxt_thread_id.h:107:
nxt_inline nxt_tid_t
nxt_thread_get_tid(void)
{
     int                  err, size;
     pthread_t            pt;
     struct __pthrdsinfo  ti;

     size = 0;
     pt = pthread_self();

     err = pthread_getthrds_np(&pt, PTHRDSINFO_QUERY_TID, &ti,
                             sizeof(struct __pthrdsinfo), NULL, size);

     if (nxt_fast_path(err == 0)) {
         return ti.__pi_tid;
     }

     nxt_main_log_alert("pthread_getthrds_np(PTHRDSINFO_QUERY_TID) 
failed %E",
                        err);
     return 0;
}

[...]

We're inlining a *huge* function in some cases where we know it's a slow 
path.  We could do better by marking the function as inline (without 
[[gnu::always_inline]]), and let the compiler decide that it doesn't 
want to inline when it's in an `nxt_slow_path()`.

My proposal is to split the inlining to two cases: `static inline` (for 
.c files) and C99 `inline` (for .h files), and have `nxt_always_inline` 
as a separate attribute that we can add selectively depending on each 
function (i.e., not adding it blindly to all functions).

For a deep explanation of the different `inline` models there are, see 
<https://www.greenend.org.uk/rjk/tech/inline.html>.

Let me summarize:

`static inline`:  This should only be used in .c files.  Basically, 
`static` should only be applied to functions local to a translation 
unit, as in non-inline static functions.  The problem of using `static 
inline` in headers is that the compiler emits duplicated function code 
in every translation unit (code bloat), for when the function is not 
inlined.  There's an exception, which is when it is used together with 
`[[gnu::always_inline]]`, which we do (I guess for this reason), but in 
that case `static` is redundant (`inline [[gnu::always_inline]]` does 
the same thing).

`inline`:  We should use the C99 model.  The GNU `inline` model is 
broken, and anyway, compilers default to C99 inline if the C version 
required in the command line is >= C99.  We should use the 
`-fno-gnu89-inline` compiler flag if available to make sure the compiler 
doesn't fallback to GNU inline mode.  Anyway, sine we use other C99 
features, this should be fine.  So, following the C99 inline model, we 
should put inline functions in headers, and then one `extern` 
declaration for each inline function in a .c file.

`[[gnu::always_inline]]` (aka `__attribute__((__always_inline__))`):  We 
should use this only when we are very sure that we want to *always* 
inline (i.e., oneliners; nxt_var_is_const() would be a great candidate).

I'd also like to have some benchmark to test that none of these changes 
reduce performance, since changes to the inlining could trigger 
important performance changes (for good or for bad, we'll see).  Do we 
have such a thing?

Cheers,

Alex

-- 
Alejandro Colomar
<http://www.alejandro-colomar.es/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mailman.nginx.org/pipermail/unit/attachments/20220602/fefb6140/attachment.bin>