Worker processes not shutting down

Tue Sep 23 22:07:17 UTC 2014

Hello!

On Fri, Sep 19, 2014 at 12:50 PM, igorhmm wrote:
> I don't known how to reproduce, not yet :-)
>
> I couldn't identify which worker was responding too, but I can see with
> strace warnings in the old wolker about EAGAIN (Resource temporarily
> unavailable). I can see that because old workers still running:
>

Nginx workers take forever to quit usually because of pending timers.

One suggestion is to dump out all the pending timers' handlers so that
we can know what parts of nginx are responsible for this. To be more
specific, you can traverse through the rbtree rooted at the C global
variable "ngx_event_timer_rbtree" and for each tree node, you obtain
the ngx_event_t object by doing the pointer arithmetic "((char *) cur
- offsetof(ngx_event_t, timer))", then check the function pointed to
by "ev->handler" [1]. All these checks can be done in a gdb script or
a systemtap script that is inspecting a typical nginx worker pending
shutting down.

[1] You can take this piece of C code from the ngx_lua module for such
an example: https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_timer.c#L465
But you need to rewrite it in gdb's python extension language or
systemtap's stap scripting language for online dynamic tracing.