[PATCH] SO_REUSEPORT support for listen sockets (round 3)

Maxim Dounin mdounin at mdounin.ru
Thu Sep 5 17:28:19 UTC 2013


On Thu, Sep 05, 2013 at 02:47:34PM +0800, Sepherosa Ziehau wrote:


> > Another aproach which may be slightly better than the code is your
> > last patch is to reopen sockets before spawning each worker
> > process: this way, master may keep listen sockets open (listen
> > queue is shared with the same socket as inherited by a worker
> > process then, right?) and worker processes are equal and don't
> > need to open sockets themself.  It needs careful handling on dead
> > process respawn codepath though.
> >
> This may be doable and could better than my approach.  I will take a look
> at the code and try implementing it.

Please note that "before" above isn't something well-thought, 
"after" might be better.

> > > In DragonFly, SO_REUSEPORT is more than load balance: it makes the accepted
> > > sockets network processing completely CPU localized (from user land to
> > > kernel land on both RX and TX path).  This level of network processing CPU
> > > localization could not be achieved by the old listen socket inheritance
> > > usage model (even if I could divide listen socket's completion queue to
> > > each CPU base on RX hash, the level of CPU localization achieved by
> > > SO_REUSEPORT still could not be achieved easily).
> >
> > Could you please point out how it's achieved?
> >
> >
> I have just put something up, which may help understanding what I have
> described above.  Here it is:
> http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt

Thanks a lot.

> > We here tend to think that proper interface from an application
> > point of view would be to implement a socket option which
> > basically creates separate listen queues for inherited sockets.
> > But if this isn't going to work, it's probably better to focus on
> >
> Well, I think I am going to stick w/ SO_REUSEPORT, mainly because the
> implementation is simple, straightforward, less invasive and the result is
> good.  Besides, user space applications only need small changes to the
> listen socket related code (most of the time, it is quite simple), which
> means easy adoption.  And in addition to TCP listen socket, SO_REUSEPORT
> also helps UDP socket reception load distribution and processing CPU
> localization.

Thanks, your position is clear enough and I understand your 
points - SO_REUSEPORT is indeed looks like a simple and effective 
aproach from kernel point of view, and probably we can live with 
it from nginx point of view too.

We were thinking about some way to implement per-process 
listen queues for sockets, probably explicitly created with some 
setsockopt to avoid a need for looking into shared queue.  I 
think it should still be possible to achieve the similar level of 
CPU locality this way, and it should require less changes than 
SO_REUSEPORT.  On the other hand, it's likely more intrusive from 
kernel point of view (and it's another interface).

> > BTW, are you going to be on the upcoming EuroBSDcon?  I'm not, but
> > Igor and Gleb Smirnoff (glebius at freebsd.org) will be there, and it
> > will be cool if you'll meet and discuss the SO_REUSEPORT usage for
> > balancing.
> >
> >
> Sorry, I am not going to attend EuroBSDcon.  However, it will be cool if we
> could discuss (through email) about SO_REUSEPORT or something that you
> folks are planning.

One of the questions we are trying to solve is whether we are 
going to work on SO_REUSEPORT balancing support in FreeBSD.  Gleb 
(who is the primary person here to do the actual work) is very 
busy right now due to upcoming FreeBSD 10 code freeze, but he 
promised to look into details and discuss this with other 
network stack developers on EuroBSDcon.

Maxim Dounin

More information about the nginx-devel mailing list