Stop handling SIGTERM and zombie processes after reconfigure

Maxim Dounin mdounin at mdounin.ru
Wed Jul 3 15:38:06 UTC 2013


Hello!

On Wed, Jul 03, 2013 at 04:48:29PM +0200, Florian S. wrote:

> Hi together!
> 
> I'm having occasionally trouble with worker processes left <defunct>
> and nginx stopping handling signals (HUP and even TERM) in general.
> 
> Upon reconfigure signal, the log shows four new processes being
> spawned, while the old four processes are shutting down:
> 
> > [notice] 5159#0: using the "epoll" event method
> > [notice] 5159#0: nginx/1.4.1
> > [notice] 5159#0: built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)
> > [notice] 5159#0: OS: Linux 3.9.7-147-x86
> > [notice] 5159#0: getrlimit(RLIMIT_NOFILE): 100000:100000
> > [notice] 5159#0: start worker processes
> > [notice] 5159#0: start worker process 5330
> > [notice] 5159#0: start worker process 5331
> > [notice] 5159#0: start worker process 5332
> > [notice] 5159#0: start worker process 5333
> > [notice] 5159#0: signal 1 (SIGHUP) received, reconfiguring
> > [notice] 5159#0: reconfiguring
> > [notice] 5159#0: using the "epoll" event method
> > [notice] 5159#0: start worker processes
> > [notice] 5159#0: start worker process 12457
> > [notice] 5159#0: start worker process 12458
> > [notice] 5159#0: start worker process 12459
> > [notice] 5159#0: start worker process 12460
> > [notice] 5159#0: start cache manager process 12461
> > [notice] 5159#0: start cache loader process 12462
> > [notice] 5331#0: gracefully shutting down
> > [notice] 5330#0: gracefully shutting down
> > [notice] 5331#0: exiting
> > [notice] 5330#0: exiting
> > [notice] 5331#0: exit
> > [notice] 5330#0: exit
> > [notice] 5332#0: gracefully shutting down
> > [notice] 5159#0: signal 17 (SIGCHLD) received
> > [notice] 5159#0: worker process 5331 exited with code 0
> > [notice] 5332#0: exiting
> > [notice] 5332#0: exit
> > [notice] 5333#0: gracefully shutting down
> > [notice] 5333#0: exiting
> > [notice] 5333#0: exit
> 
> After that, nginx is fully operational and serving requests --
> however, ps yields:
> 
> > root    5159 0.0 0.0 6248 1696 ?     Ss 10:43 0:00 nginx: master
> process /chroots/nginx/nginx -c /chroots/nginx/conf/nginx.conf
> > nobody  5330 0.0 0.0    0    0 ?     Z  10:43 0:00 [nginx] <defunct>
> > nobody  5332 0.0 0.0    0    0 ?     Z  10:43 0:00 [nginx] <defunct>
> > nobody  5333 0.0 0.0    0    0 ?     Z  10:43 0:00 [nginx] <defunct>
> > nobody 12457 0.0 0.0 8332 2940 ?     S  10:44 0:00 nginx: worker process
> > nobody 12458 0.0 0.0 8332 2940 ?     S  10:44 0:00 nginx: worker process
> > nobody 12459 0.0 0.0 8332 3544 ?     S  10:44 0:00 nginx: worker process
> > nobody 12460 0.0 0.0 8332 2940 ?     S  10:44 0:00 nginx: worker process
> > nobody 12461 0.0 0.0 6296 1068 ?     S  10:44 0:00 nginx: cache
> manager process
> > nobody 12462 0.0 0.0    0    0 ?     Z  10:44 0:00 [nginx] <defunct>
> 
> In the log one can see that SIGCHLD is only received once for 5331,
> which does not show up as zombie -- in contrast to the workers 5330,
> 5332, 5333, and the cache loader 12462.
> Much more serious is that neither
> 
> > /chroots/nginx/nginx -c /chroots/nginx/conf/nginx.conf -s(stop|reload)
> 
> nor
> 
> > kill 5159
> 
> seem to get handled by nginx anymore (nothing in the log and no
> effect). Maybe the master process is stuck waiting for some mutex?:
> 
> >strace -p 5159
> > Process 5159 attached - interrupt to quit
> > futex(0xb7658e6c, FUTEX_WAIT_PRIVATE, 2, NULL
> 
> Unfortunately, I missed to get a core dump of the master process
> while it was running. Additionally, there is no debug log available,
> sorry. As I was not able to reliably reproduce this issue, I'll most
> probably have to wait...

It indeed looks like the master process is blocked somewhere.  It 
would be interesting to see stack trace of a master process when 
this happens. 

(It's also good idea to make sure there are no 3rd party 
modules/patches, just in case.)

-- 
Maxim Dounin
http://nginx.org/en/donation.html



More information about the nginx-devel mailing list