[PATCH] SO_REUSEPORT support for listen sockets (round 3)

Maxim Dounin mdounin at mdounin.ru
Tue Sep 3 14:36:44 UTC 2013


Hello!

On Tue, Sep 03, 2013 at 10:31:55AM +0800, Sepherosa Ziehau wrote:

[...]

> > While this look like better that what was with previous patches
> > (mostly due to inheritance handled by kernel), it still looks very
> > fragile for me.  In particular, I really dislike the trick with
> > making first worker process special.
> >
> >
> Well, the idea is to keep at least one listen socket opened.  Maybe I could
> find other way in kernel to make it less tricky.  However, that may add
> extra syscall or socket option.

I think extra syscall/socket option will be ok as long as it'll 
save us from the hassle of opening sockets.  Not sure what to do 
with Linux compatibility though.

Another aproach which may be slightly better than the code is your 
last patch is to reopen sockets before spawning each worker 
process: this way, master may keep listen sockets open (listen 
queue is shared with the same socket as inherited by a worker 
process then, right?) and worker processes are equal and don't 
need to open sockets themself.  It needs careful handling on dead 
process respawn codepath though.

> > It's probably should either left in the state "nothing is
> > guaranteed" (with some understanding of what will happen in
> > various common situations like reconfiguration, upgrade, switching
> > so_reuseport on/off) or some way should be found to make things
> > less tricky.
> >
> 
> To be frank, at least interfering the reconfigure probably is not wanted.
> And I don't want "nothing is guaranteed" (which probably is the first 2
> patches).

As far as I can tell, reconfiguration should just work with 
inheritance in the kernel you've implemented - as new worker 
processes are spawn before old worker processes are created.  
There may be races though.

[...]

> > (We've also discussed this here in office serveral times, and it
> > seems that general consensus is that SO_REUSEPORT for TCP balancing
> > isn't really good interface.  It would be much easier for everyone
> > if normal workflow with inherited listen socket descriptors just
> > worked.  Especially given the fact that in nginx case it's mostly
> > about benchmarking, since in real life load distribution between
> > worker processes is good enough.)
> 
> 
> In DragonFly, SO_REUSEPORT is more than load balance: it makes the accepted
> sockets network processing completely CPU localized (from user land to
> kernel land on both RX and TX path).  This level of network processing CPU
> localization could not be achieved by the old listen socket inheritance
> usage model (even if I could divide listen socket's completion queue to
> each CPU base on RX hash, the level of CPU localization achieved by
> SO_REUSEPORT still could not be achieved easily).

Could you please point out how it's achieved?

We here tend to think that proper interface from an application 
point of view would be to implement a socket option which 
basically creates separate listen queues for inherited sockets.  
But if this isn't going to work, it's probably better to focus on 
SO_REUSEPORT.

BTW, are you going to be on the upcoming EuroBSDcon?  I'm not, but 
Igor and Gleb Smirnoff (glebius at freebsd.org) will be there, and it 
will be cool if you'll meet and discuss the SO_REUSEPORT usage for 
balancing.

-- 
Maxim Dounin
http://nginx.org/en/donation.html



More information about the nginx-devel mailing list