[PATCH] os/unix: don't stop old workers if reload failed

Maxim Dounin mdounin at mdounin.ru
Thu Dec 17 13:27:48 UTC 2020


On Thu, Dec 17, 2020 at 03:20:19PM +0800, kasei at kasei.im wrote:

> 在 2020-12-10星期四的 22:47 +0300,Maxim Dounin写道:
> > Hello!
> > 
> > On Fri, Dec 04, 2020 at 12:29:51PM +0800, kasei at kasei.im wrote:
> > 
> > > *Sorry for the format issue, resending it.
> > >   Hello, We found sometimes nginx might failed to start new worker
> > > processes
> > > during reconfiguring, (the number of processes exceeds
> > > example). In that case, the master process still send QUIT signal
> > > to old
> > > workers, leads to the situation that there is no worker processes
> > > accept socket
> > > from listening sockets.
> > >   And we found actually there is a return code in os/win32's
> > > ngx_start_worker_processes funcation. It skip quiting workers if
> > > there is no
> > > worker processes be spawned. So I written this patch followed
> > > win32's
> > > implementation and tested it. Cloud you please check this patch to
> > > see if there
> > > is anything missed? Thanks very much.
> > 
> > Thanks for the patch.
> > 
> > The win32 implementation works this way because spawning new 
> > processes on Windows might easily fail for multiple reasons as 
> > there is no fork().  On the other hand, this is not expected to 
> > ever happen on Unix systems, except may be hitting variouos 
> > limits, either nginx own NGX_MAX_PROCESSES limit or system limits.  
> > That is, on Unix systems starting worker processes more or less is 
> > only expected to fail in case of severe misconfiguration.
> > 
> > Further, I'm not sure that current win32 behaviour is a safe one, 
> > as resulting state is inconsistent: worker processes does not 
> > match master configuration.  This probably doesn't mater for win32 
> > as starting worker processes is anyway racy, but might make things 
> > worse on Unix systems.
> > 
> > And, as your own TODO mentions, the approach "do not stop old 
> > worker processes if no new worker processes were started" won't 
> > prevent degraded service if some worker processes were not able to 
> > start and listening sockets with the reuseport option are used.
> > 
> > Could you please clarify - are you trying to solve some problem 
> > you are facing in practice?  If yes, could you please provide some 
> > more details?
> > 
> Hello, thanks for your reply! 
> Yes, I am tring slove a problem in production environment. In short,
> sometimes we reload nginx in a high frquency like 2-3 minutes, then
> some long connections may holds many shutting down workers. If the
> number of existing workers exceeds NGX_MAX_PROCESSES, nginx can't spawn
> new workers. Yes, I understand it's an abuse of nginx, and the
> frequency of reloading should be reduced. But I'm thinking it might
> better to let nginx works with old config (equals reloading failed)
> instead of stop accpeting new connections. 

Ok, thanks for the details.

As previously suggested, it looks more like a misconfiguration to 
me.  And if we want to prevent it from happening by nginx, a 
better solution might be to complain and fail earlier, before 
applying the new configuration.  For example, nginx might refuse 
to reload if there are more worker processes configured than free 
process slots available, similarly to how it currently refuses to 
upgrade if previous binary upgrade isn't complete yet.  I'm not 
sure it worth the effort though.

In your particular case it may make sense to avoid reloading nginx 
if some number of previous reloads weren't complete yet.  E.g., 
you may want to modify your reload scripts (or procedures, if 
reloading is manual) to avoid reloading nginx if there are, say, 
more than 100 worker processes are still running.

Alternatively, you may want to configure worker_shutdown_timeout 
(http://nginx.org/r/worker_shutdown_timeout) to a reasonable 
value.  It will make sure that old worker processes are terminated 
in a reasonable time even if there are active connections.

Maxim Dounin

More information about the nginx-devel mailing list