accpet_mutex cause nginx worker balance problem

Maxim Dounin mdounin at
Mon Aug 4 14:57:51 UTC 2014


On Sun, Aug 03, 2014 at 10:47:26PM -0400, xinghua_hi wrote:

> hello,
>        I still can't understand why accept_mutex cause disbalance. In code
> below, multi worker will try to get mutex and the question is ,  why one
> worker can always get the mutex ? I  test many times,  find that one worker
> can always accept new connection much more than others. Thanks very much.

Only worker which holds the accept mutex will try to accept new 
connections.  Other workers will only process events they already 
have, or try to grab accept mutex again after 500ms timeout 
(accept_mutex_delay[1]) if there are no other events to handle.

Consider a short test on otherwise idle server like one you are 
doing, with many connections established during a small period of 
time.  Assume there are 2 workers:

- worker A holds accept mutex, worker B waits for 500ms timeout 
  doing nothing;

- in a short period of time 1000 connections comes in;

- worker A woken up by the kernel, accepts a connection;

- worker A goes back to the kernel to wait for more data; since 
  worker B is in kernel waiting for a 500ms timeout, accept mutex 
  is again locked by A;

- worker A wokern up again, and the above repeats multiple times.

More or less this continues till worker B wakes up after 500ms and 
tries to lock the accept mutex.  If it is lucky and this happens 
when worker A is doing something, it will be able to lock the 
accept mutex.  That is, further connections will be accepted by 
worker B.  If worker B isn't lucky, then worker A will accept 
connections for more time.  For short tests this may mean that all 
connections will be accepted by a single worker.  (And things will 
be even worse if multi_accept[2] is used.)

On a normally loaded server the above situation isn't likely to 
happen as all workers are priodically woken up by the kernel, and 
will try to lock accept mutex when going back to the kernel.  Thus 
connections are distributed among all workers more or less evenly.  
In short tests though, accept_mutex can easily cause disbalance as 
described above.


Maxim Dounin

