Stop handling SIGTERM and zombie processes after reconfigure

Wed Jul 3 14:48:29 UTC 2013

Hi together!

I'm having occasionally trouble with worker processes left <defunct> and 
nginx stopping handling signals (HUP and even TERM) in general.

Upon reconfigure signal, the log shows four new processes being spawned, 
while the old four processes are shutting down:

 > [notice] 5159#0: using the "epoll" event method
 > [notice] 5159#0: nginx/1.4.1
 > [notice] 5159#0: built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)
 > [notice] 5159#0: OS: Linux 3.9.7-147-x86
 > [notice] 5159#0: getrlimit(RLIMIT_NOFILE): 100000:100000
 > [notice] 5159#0: start worker processes
 > [notice] 5159#0: start worker process 5330
 > [notice] 5159#0: start worker process 5331
 > [notice] 5159#0: start worker process 5332
 > [notice] 5159#0: start worker process 5333
 > [notice] 5159#0: signal 1 (SIGHUP) received, reconfiguring
 > [notice] 5159#0: reconfiguring
 > [notice] 5159#0: using the "epoll" event method
 > [notice] 5159#0: start worker processes
 > [notice] 5159#0: start worker process 12457
 > [notice] 5159#0: start worker process 12458
 > [notice] 5159#0: start worker process 12459
 > [notice] 5159#0: start worker process 12460
 > [notice] 5159#0: start cache manager process 12461
 > [notice] 5159#0: start cache loader process 12462
 > [notice] 5331#0: gracefully shutting down
 > [notice] 5330#0: gracefully shutting down
 > [notice] 5331#0: exiting
 > [notice] 5330#0: exiting
 > [notice] 5331#0: exit
 > [notice] 5330#0: exit
 > [notice] 5332#0: gracefully shutting down
 > [notice] 5159#0: signal 17 (SIGCHLD) received
 > [notice] 5159#0: worker process 5331 exited with code 0
 > [notice] 5332#0: exiting
 > [notice] 5332#0: exit
 > [notice] 5333#0: gracefully shutting down
 > [notice] 5333#0: exiting
 > [notice] 5333#0: exit

After that, nginx is fully operational and serving requests -- however, 
ps yields:

 > root    5159 0.0 0.0 6248 1696 ?     Ss 10:43 0:00 nginx: master 
process /chroots/nginx/nginx -c /chroots/nginx/conf/nginx.conf
 > nobody  5330 0.0 0.0    0    0 ?     Z  10:43 0:00 [nginx] <defunct>
 > nobody  5332 0.0 0.0    0    0 ?     Z  10:43 0:00 [nginx] <defunct>
 > nobody  5333 0.0 0.0    0    0 ?     Z  10:43 0:00 [nginx] <defunct>
 > nobody 12457 0.0 0.0 8332 2940 ?     S  10:44 0:00 nginx: worker process
 > nobody 12458 0.0 0.0 8332 2940 ?     S  10:44 0:00 nginx: worker process
 > nobody 12459 0.0 0.0 8332 3544 ?     S  10:44 0:00 nginx: worker process
 > nobody 12460 0.0 0.0 8332 2940 ?     S  10:44 0:00 nginx: worker process
 > nobody 12461 0.0 0.0 6296 1068 ?     S  10:44 0:00 nginx: cache 
manager process
 > nobody 12462 0.0 0.0    0    0 ?     Z  10:44 0:00 [nginx] <defunct>

In the log one can see that SIGCHLD is only received once for 5331, 
which does not show up as zombie -- in contrast to the workers 5330, 
5332, 5333, and the cache loader 12462.
Much more serious is that neither

 > /chroots/nginx/nginx -c /chroots/nginx/conf/nginx.conf -s(stop|reload)

nor

 > kill 5159

seem to get handled by nginx anymore (nothing in the log and no effect). 
Maybe the master process is stuck waiting for some mutex?:

> strace -p 5159
 > Process 5159 attached - interrupt to quit
>  futex(0xb7658e6c, FUTEX_WAIT_PRIVATE, 2, NULL

Unfortunately, I missed to get a core dump of the master process while 
it was running. Additionally, there is no debug log available, sorry. As 
I was not able to reliably reproduce this issue, I'll most probably have 
to wait...

Many thanks in advance and kind regards,
Florian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20130703/514baa92/attachment.html>