[PATCH] SO_REUSEPORT support for listen sockets (round 3)
mdounin at mdounin.ru
Mon Sep 2 14:49:27 UTC 2013
(Sorry again for late reply. See below for comments.)
On Fri, Aug 02, 2013 at 01:16:53PM +0800, Sepherosa Ziehau wrote:
> Here is another round of SO_REUSEPORT support. The plot is changed a
> little bit to allow smooth configure reloading and binary upgrading.
> Here is what happens when so_reuseport is enable (this does not affect
> single process model):
> - Master creates the listen sockets w/ SO_REUSEPORT, but does not configure them
> - The first worker process will inherit the listen sockets created by
> master and configure them
> - After master forked the first worker process all listen sockets are closed
> - The rest of the workers will create their own listen sockets w/ SO_REUSEPORT
> - During binary upgrade, listen sockets are no longer passed through
> environment variables, since new master will create its own listen
> sockets. Well, the old master actually does not have any listen
> sockets opened :).
> The idea behind this plot is that at any given time, there is always
> one listen socket left, which could inherit the syncaches and pending
> sockets on the to-be-closed listen sockets. The inheritance itself is
> handled by the kernel; I implemented this inheritance for DragonFlyBSD
> recently (http://gitweb.dragonflybsd.org/dragonfly.git/commit/02ad2f0b874fb0a45eb69750219f79f5e8982272).
> I am not tracking Linux's code, but I think Linux side will
> eventually get (or already got) the proper fix.
> The patch itself:
> Configuration reloading and binary upgrading will not be interfered as
> w/ the first 2 patches.
> Binary upgrading reverting method 1 ("Send the HUP signal to the old
> master process. ...") will not be interfered as w/ the first 2
> patches. There still could be some glitch (but not that worse as w/
> the first 2 patches) if binary upgrading reverting method 2 ("Send the
> TERM signal to the new master process. ...") is used. I think we
> probably just need to mention that in the document.
While this look like better that what was with previous patches
(mostly due to inheritance handled by kernel), it still looks very
fragile for me. In particular, I really dislike the trick with
making first worker process special.
It's probably should either left in the state "nothing is
guaranteed" (with some understanding of what will happen in
various common situations like reconfiguration, upgrade, switching
so_reuseport on/off) or some way should be found to make things
Additional question to consider is what happens with security
checks? Linux seems to require processs user id match on
SO_REUSEPORT sockets, and I would expect this to fail if there are
sockets opened both in master and in worker processes; and
privileged port checks might cause problems as well.
(We've also discussed this here in office serveral times, and it
seems that general consensus is that SO_REUSEPORT for TCP balancing
isn't really good interface. It would be much easier for everyone
if normal workflow with inherited listen socket descriptors just
worked. Especially given the fact that in nginx case it's mostly
about benchmarking, since in real life load distribution between
worker processes is good enough.)
More information about the nginx-devel