[PATCH] os/unix: don't stop old workers if reload failed

Thu Dec 17 07:20:19 UTC 2020

在 2020-12-10星期四的 22:47 +0300，Maxim Dounin写道：
> Hello!
> 
> On Fri, Dec 04, 2020 at 12:29:51PM +0800, kasei at kasei.im wrote:
> 
> > *Sorry for the format issue, resending it.
> >   Hello, We found sometimes nginx might failed to start new worker
> > processes
> > during reconfiguring, (the number of processes exceeds
> > NGX_MAX_PROCESSES for
> > example). In that case, the master process still send QUIT signal
> > to old
> > workers, leads to the situation that there is no worker processes
> > accept socket
> > from listening sockets.
> >   And we found actually there is a return code in os/win32's
> > ngx_start_worker_processes funcation. It skip quiting workers if
> > there is no
> > worker processes be spawned. So I written this patch followed
> > win32's
> > implementation and tested it. Cloud you please check this patch to
> > see if there
> > is anything missed? Thanks very much.
> 
> Thanks for the patch.
> 
> The win32 implementation works this way because spawning new 
> processes on Windows might easily fail for multiple reasons as 
> there is no fork().  On the other hand, this is not expected to 
> ever happen on Unix systems, except may be hitting variouos 
> limits, either nginx own NGX_MAX_PROCESSES limit or system limits.  
> That is, on Unix systems starting worker processes more or less is 
> only expected to fail in case of severe misconfiguration.
> 
> Further, I'm not sure that current win32 behaviour is a safe one, 
> as resulting state is inconsistent: worker processes does not 
> match master configuration.  This probably doesn't mater for win32 
> as starting worker processes is anyway racy, but might make things 
> worse on Unix systems.
> 
> And, as your own TODO mentions, the approach "do not stop old 
> worker processes if no new worker processes were started" won't 
> prevent degraded service if some worker processes were not able to 
> start and listening sockets with the reuseport option are used.
> 
> Could you please clarify - are you trying to solve some problem 
> you are facing in practice?  If yes, could you please provide some 
> more details?
> 
Hello, thanks for your reply! 

Yes, I am tring slove a problem in production environment. In short,
sometimes we reload nginx in a high frquency like 2-3 minutes, then
some long connections may holds many shutting down workers. If the
number of existing workers exceeds NGX_MAX_PROCESSES, nginx can't spawn
new workers. Yes, I understand it's an abuse of nginx, and the
frequency of reloading should be reduced. But I'm thinking it might
better to let nginx works with old config (equals reloading failed)
instead of stop accpeting new connections.