[PATCH] SO_REUSEPORT support for listen sockets (round 3)

Sepherosa Ziehau sepherosa at gmail.com
Thu Sep 5 06:47:34 UTC 2013

On Tue, Sep 3, 2013 at 10:36 PM, Maxim Dounin <mdounin at mdounin.ru> wrote:

> Hello!


> > Well, the idea is to keep at least one listen socket opened.  Maybe I
> could
> > find other way in kernel to make it less tricky.  However, that may add
> > extra syscall or socket option.
> I think extra syscall/socket option will be ok as long as it'll
> save us from the hassle of opening sockets.  Not sure what to do
> with Linux compatibility though.

Yeah, this is also my concern.

> Another aproach which may be slightly better than the code is your
> last patch is to reopen sockets before spawning each worker
> process: this way, master may keep listen sockets open (listen
> queue is shared with the same socket as inherited by a worker
> process then, right?) and worker processes are equal and don't
> need to open sockets themself.  It needs careful handling on dead
> process respawn codepath though.

This may be doable and could better than my approach.  I will take a look
at the code and try implementing it.

> > > (We've also discussed this here in office serveral times, and it
> > > seems that general consensus is that SO_REUSEPORT for TCP balancing
> > > isn't really good interface.  It would be much easier for everyone
> > > if normal workflow with inherited listen socket descriptors just
> > > worked.  Especially given the fact that in nginx case it's mostly
> > > about benchmarking, since in real life load distribution between
> > > worker processes is good enough.)
> >
> >
> > In DragonFly, SO_REUSEPORT is more than load balance: it makes the
> accepted
> > sockets network processing completely CPU localized (from user land to
> > kernel land on both RX and TX path).  This level of network processing
> > localization could not be achieved by the old listen socket inheritance
> > usage model (even if I could divide listen socket's completion queue to
> > each CPU base on RX hash, the level of CPU localization achieved by
> > SO_REUSEPORT still could not be achieved easily).
> Could you please point out how it's achieved?

I have just put something up, which may help understanding what I have
described above.  Here it is:

> We here tend to think that proper interface from an application
> point of view would be to implement a socket option which
> basically creates separate listen queues for inherited sockets.
> But if this isn't going to work, it's probably better to focus on

Well, I think I am going to stick w/ SO_REUSEPORT, mainly because the
implementation is simple, straightforward, less invasive and the result is
good.  Besides, user space applications only need small changes to the
listen socket related code (most of the time, it is quite simple), which
means easy adoption.  And in addition to TCP listen socket, SO_REUSEPORT
also helps UDP socket reception load distribution and processing CPU

> BTW, are you going to be on the upcoming EuroBSDcon?  I'm not, but
> Igor and Gleb Smirnoff (glebius at freebsd.org) will be there, and it
> will be cool if you'll meet and discuss the SO_REUSEPORT usage for
> balancing.

Sorry, I am not going to attend EuroBSDcon.  However, it will be cool if we
could discuss (through email) about SO_REUSEPORT or something that you
folks are planning.

Best Regards,

Tomorrow Will Never Die
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20130905/b0690ed1/attachment-0001.html>

More information about the nginx-devel mailing list