[PATCH] SO_REUSEPORT support for listen sockets (round 3)

Maxim Dounin mdounin at mdounin.ru
Mon Sep 2 14:49:27 UTC 2013


Hello!

(Sorry again for late reply.  See below for comments.)

On Fri, Aug 02, 2013 at 01:16:53PM +0800, Sepherosa Ziehau wrote:

> Here is another round of SO_REUSEPORT support.  The plot is changed a
> little bit to allow smooth configure reloading and binary upgrading.
> Here is what happens when so_reuseport is enable (this does not affect
> single process model):
> - Master creates the listen sockets w/ SO_REUSEPORT, but does not configure them
> - The first worker process will inherit the listen sockets created by
> master and configure them
> - After master forked the first worker process all listen sockets are closed
> - The rest of the workers will create their own listen sockets w/ SO_REUSEPORT
> - During binary upgrade, listen sockets are no longer passed through
> environment variables, since new master will create its own listen
> sockets.  Well, the old master actually does not have any listen
> sockets opened :).
> 
> The idea behind this plot is that at any given time, there is always
> one listen socket left, which could inherit the syncaches and pending
> sockets on the to-be-closed listen sockets.  The inheritance itself is
> handled by the kernel; I implemented this inheritance for DragonFlyBSD
> recently (http://gitweb.dragonflybsd.org/dragonfly.git/commit/02ad2f0b874fb0a45eb69750219f79f5e8982272).
>  I am not tracking Linux's code, but I think Linux side will
> eventually get (or already got) the proper fix.
> 
> The patch itself:
> http://leaf.dragonflybsd.org/~sephe/ngx_soreuseport3.diff
> 
> Configuration reloading and binary upgrading will not be interfered as
> w/ the first 2 patches.
> 
> Binary upgrading reverting method 1 ("Send the HUP signal to the old
> master process. ...") will not be interfered as w/ the first 2
> patches.  There still could be some glitch (but not that worse as w/
> the first 2 patches) if binary upgrading reverting method 2 ("Send the
> TERM signal to the new master process. ...") is used.  I think we
> probably just need to mention that in the document.

While this look like better that what was with previous patches 
(mostly due to inheritance handled by kernel), it still looks very 
fragile for me.  In particular, I really dislike the trick with 
making first worker process special.

It's probably should either left in the state "nothing is 
guaranteed" (with some understanding of what will happen in 
various common situations like reconfiguration, upgrade, switching 
so_reuseport on/off) or some way should be found to make things 
less tricky.

Additional question to consider is what happens with security 
checks?  Linux seems to require processs user id match on 
SO_REUSEPORT sockets, and I would expect this to fail if there are 
sockets opened both in master and in worker processes; and 
privileged port checks might cause problems as well.

(We've also discussed this here in office serveral times, and it 
seems that general consensus is that SO_REUSEPORT for TCP balancing 
isn't really good interface.  It would be much easier for everyone 
if normal workflow with inherited listen socket descriptors just 
worked.  Especially given the fact that in nginx case it's mostly 
about benchmarking, since in real life load distribution between 
worker processes is good enough.)

-- 
Maxim Dounin
http://nginx.org/en/donation.html



More information about the nginx-devel mailing list