Question about http_stub_status

Tue Aug 26 20:25:24 MSD 2008

On Sat, Aug 23, 2008 at 01:18:07PM +0200, Marcus Bianchy wrote:

> > This means that either someone killed nginx workers using SIGTERM/INT/KILL
> > or workers exited abnornamally. Could you run
> >
> > grep alert error_log
> Well, I can say that no one of our team send's such signals around...
> We're observing strange signal 8 (SIGFPE) errors the last time:
> A typical "grep/zgrep signal" of our error.logs shows things similar
> like this:
> 
> ############ snip ############
> 2008/08/22 10:09:42 [notice] 28631#0: signal 17 (SIGCHLD) received
> 2008/08/22 10:09:42 [alert] 28631#0: worker process 27809 exited on
> signal 8
> 2008/08/22 10:09:42 [notice] 28631#0: signal 29 (SIGIO) received
> 2008/08/22 12:58:06 [notice] 28631#0: signal 17 (SIGCHLD) received
> 2008/08/22 12:58:06 [alert] 28631#0: worker process 27810 exited on
> signal 8
> 2008/08/22 12:58:06 [notice] 28631#0: signal 29 (SIGIO) received
> 2008/08/22 12:58:06 [notice] 28631#0: signal 17 (SIGCHLD) received
> 2008/08/22 12:58:06 [alert] 28631#0: worker process 32013 exited on
> signal 8
> 2008/08/22 12:58:06 [notice] 28631#0: signal 29 (SIGIO) received
> 2008/08/22 12:58:11 [notice] 28631#0: signal 17 (SIGCHLD) received
> 2008/08/22 12:58:11 [alert] 28631#0: worker process 27811 exited on
> signal 8
> 2008/08/22 12:58:11 [notice] 28631#0: signal 29 (SIGIO) received
> 2008/08/22 12:58:20 [notice] 28631#0: signal 17 (SIGCHLD) received
> 2008/08/22 12:58:20 [alert] 28631#0: worker process 785 exited on signal 8
> 2008/08/22 12:58:20 [notice] 28631#0: signal 29 (SIGIO) received
> 2008/08/22 12:59:36 [notice] 28631#0: signal 17 (SIGCHLD) received
> 2008/08/22 12:59:36 [alert] 28631#0: worker process 1342 exited on
> signal 8
> 2008/08/22 12:59:36 [notice] 28631#0: signal 29 (SIGIO) received
> 2008/08/22 13:00:06 [notice] 28631#0: signal 17 (SIGCHLD) received
> 2008/08/22 13:00:06 [alert] 28631#0: worker process 1343 exited on
> signal 8
> 2008/08/22 13:00:06 [notice] 28631#0: signal 29 (SIGIO) received
> 2008/08/23 04:02:18 [notice] 28631#0: signal 17 (SIGCHLD) received
> 2008/08/23 04:02:18 [alert] 28631#0: worker process 1344 exited on
> signal 8
> 2008/08/23 04:02:18 [notice] 28631#0: signal 29 (SIGIO) received
> ################## snip #############
> 
> The logrotate runs at 04:00 in the morning, that would explain the
> SIGCHLD/SIGFPE at 04:02:18. But the real problem are the signals at
> around 1pm; neither the access.log nor the error.log gives any hint
> for the thing that  produces  this behaviour.  And guess:  yesterday
> at 1pm the values for active/waiting connections increaesed to
> ~30000/35000.
> 
> Maybe it's a good idea to allow core dumps to exactly reproduce what
> causes these signals?

It seems that you have "max_fails=0" in some upstream.
The recent Maxim's patch fixes the bug or you may try nginx-0.7.12.

-- 
Igor Sysoev
http://sysoev.ru/en/