<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi together!<br>
<br>
I'm having occasionally trouble with worker processes left
<defunct> and nginx stopping handling signals (HUP and even
TERM) in general.<br>
<br>
Upon reconfigure signal, the log shows four new processes being
spawned, while the old four processes are shutting down:<br>
<tt><br>
> </tt><tt>[notice] 5159#0: using the "epoll" event method</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: nginx/1.4.1</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: built by gcc 4.4.3 (Ubuntu
4.4.3-4ubuntu5.1)</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: OS: Linux 3.9.7-147-x86</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: getrlimit(RLIMIT_NOFILE):
100000:100000</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: start worker processes</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: start worker process 5330</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: start worker process 5331</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: start worker process 5332</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: start worker process 5333</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: signal 1 (SIGHUP) received,
reconfiguring</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: reconfiguring</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: using the "epoll" event
method</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: start worker processes</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: start worker process 12457</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: start worker process 12458</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: start worker process 12459</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: start worker process 12460</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: start cache manager process
12461</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: start cache loader process
12462</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5331#0: gracefully shutting down</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5330#0: gracefully shutting down</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5331#0: exiting</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5330#0: exiting</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5331#0: exit</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5330#0: exit</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5332#0: gracefully shutting down</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: signal 17 (SIGCHLD)
received </tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5159#0: worker process 5331 exited
with code 0</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5332#0: exiting</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5332#0: exit</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5333#0: gracefully shutting down</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5333#0: exiting</tt><tt><br>
</tt><tt><tt>> </tt>[notice] 5333#0: exit<br>
<br>
</tt>After that, nginx is fully operational and serving requests --
however, ps yields:<br>
<tt><br>
> root 5159 0.0 0.0 6248 1696 ? Ss 10:43 0:00 nginx:
master process /chroots/nginx/nginx -c
/chroots/nginx/conf/nginx.conf</tt><tt><br>
</tt><tt>> nobody 5330 0.0 0.0 0 0 ? Z 10:43 0:00
[nginx] <defunct></tt><tt><br>
</tt><tt>> nobody 5332 0.0 0.0 0 0 ? Z 10:43 0:00
[nginx] <defunct></tt><tt><br>
</tt><tt>> nobody 5333 0.0 0.0 0 0 ? Z 10:43 0:00
[nginx] <defunct></tt><tt><br>
</tt><tt>> nobody 12457 0.0 0.0 8332 2940 ? S 10:44 0:00
nginx: worker process</tt><tt><br>
</tt><tt>> nobody 12458 0.0 0.0 8332 2940 ? S 10:44 0:00
nginx: worker process</tt><tt><br>
</tt><tt>> nobody 12459 0.0 0.0 8332 3544 ? S 10:44 0:00
nginx: worker process</tt><tt><br>
</tt><tt>> nobody 12460 0.0 0.0 8332 2940 ? S 10:44 0:00
nginx: worker process</tt><tt><br>
</tt><tt>> nobody 12461 0.0 0.0 6296 1068 ? S 10:44 0:00
nginx: cache manager process</tt><tt><br>
</tt><tt>> nobody 12462 0.0 0.0 0 0 ? Z 10:44 0:00
[nginx] <defunct></tt><br>
<br>
In the log one can see that SIGCHLD is only received once for 5331,
which does not show up as zombie -- in contrast to the workers 5330,
5332, 5333, and the cache loader 12462.<br>
Much more serious is that neither<span class="ansi0 bgAnsi15"
style=""><br>
<tt><br>
> /chroots/nginx/nginx -c /chroots/nginx/conf/nginx.conf -s</tt></span><tt>
(stop|reload</tt><tt>)</tt><tt><br>
</tt><tt><br>
</tt>nor<tt><br>
<br>
> kill 5159 </tt><br>
<br>
seem to get handled by nginx anymore (nothing in the log and no
effect). Maybe the master process is stuck waiting for some mutex?:<span
class="ansi0 bgAnsi15" style=""><br>
<tt><br>
</tt><tt>> strace -p 5159</tt></span><tt><span class="ansi0
bgAnsi15" style=""><br>
> Process 5159 attached - interrupt to quit</span></tt>
<div style="height: 19px;"><tt><span class="ansi0 bgAnsi15" style="">>
futex(0xb7658e6c, FUTEX_WAIT_PRIVATE, 2, NULL</span></tt></div>
<tt>
</tt><br>
Unfortunately, I missed to get a core dump of the master process
while it was running. Additionally, there is no debug log available,
sorry. As I was not able to reliably reproduce this issue, I'll most
probably have to wait...<br>
<br>
Many thanks in advance and kind regards,<br>
Florian<br>
<br>
</body>
</html>