Fix for ngx_unlock() and thus race conditions in AIO mechanism

Mon Apr 18 21:40:59 UTC 2016

Hi,

Some background: enabling AIO threads can result in "stuck" connections,
permanently waiting for handler callback (in the system these connections
appear in a CLOSE_WAIT state).  This happens due to race condition(s) in
the AIO thread pool code.  Since the window is tiny, it may take hundreds
of thousands of requests on a many-core machine to trigger a single race,
but it is present in the real world: on some systems we accumulate over
a hundred of them a day.

Fix: add a compiler barrier to ngx_unlock(); since it is a macro, it is
subject to the compiler reordering optimisations.  Consider the following
fragment of the ngx_thread_pool_cycle() function:

        ngx_spinlock(&ngx_thread_pool_done_lock, 1, 2048);
        *ngx_thread_pool_done.last = task;
        ngx_thread_pool_done.last = &task->next;
        ngx_unlock(&ngx_thread_pool_done_lock);
        (void) ngx_notify(ngx_thread_pool_handler);

On nginx 1.9.14 RHEL 6 build (nginx-1.9.14-1.el6.ngx.x86_64.rpm), you can
see the following assembly code:

  438a05:       bf 78 f6 70 00          mov    $0x70f678,%edi
  ...
  438a1c:       e8 df a6 fe ff          callq  423100 <ngx_spinlock>
  438a21:       48 8b 05 60 6c 2d 00    mov    0x2d6c60(%rip),%rax
  438a28:       48 c7 05 45 6c 2d 00    movq   $0x0,0x2d6c45(%rip)   # <--
  438a2f:       00 00 00 00
  438a33:       bf b0 8a 43 00          mov    $0x438ab0,%edi
  438a38:       48 89 2d 49 6c 2d 00    mov    %rbp,0x2d6c49(%rip)
  438a3f:       48 89 28                mov    %rbp,(%rax)
  438a42:       ff 15 08 72 2d 00       callq  *0x2d7208(%rip)

You can see that at 0x438a28 it performs "*lock = 0" before inserting the
task into the queue.  This can race with ngx_thread_pool_handler(), where
ngx_thread_pool_done list is reset before the "task" is inserted (that is,
tasks just get lost).

On x86/amd64, the unlock operation only needs a *compiler* barrier, it does
not need a *memory* barrier.  I used ngx_memory_barrier() in the patch, but
you can optimise x86/amd64 to use __asm __volatile("":::"memory").

The patch is attached.  Even better fix would be to use pthread_spin_lock
or just pthread_mutex_lock.  Why roll out your own one?

-- 
Mindaugas
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ngx_unlock.patch
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20160418/fbabe82f/attachment.ksh>