[PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"
Oded Arbel
oded at geek.co.il
Mon Aug 15 14:59:36 UTC 2011
Regarding the above mentioned patch (also quoted below), I wanted to provide feedback on this:
On my system, we have several reverse proxy servers running Nginx and forwarding requests to upstream. Our configuration looks like this:
upstream trc {
server prod2-f1:10213 max_fails=500 fail_timeout=30s;
server prod2-f2:10213 max_fails=500 fail_timeout=30s;
...
server 127.0.0.1:10213 backup;
ip_hash;
}
We've noticed that every once in a while (about 5-10 times a week) one of the servers gets into a state where an Nginx worker starts eating 100% CPU and timing out on requests. I've applied the aforementioned patch to our Nginx installation (release 1.0.0 with the Nginx_Upstream_Hash patch) and deployed to our production servers. After a few hours, we started having the Nginx workers on all the servers eat 100% CPU.
Connecting with gdb to one of the problematic worker I got this backtrace:
#0 0x000000000044a650 in ngx_http_upstream_get_round_robin_peer ()
#1 0x00000000004253dc in ngx_event_connect_peer ()
#2 0x0000000000448618 in ngx_http_upstream_connect ()
#3 0x0000000000448e10 in ngx_http_upstream_process_header ()
#4 0x00000000004471fb in ngx_http_upstream_handler ()
#5 0x00000000004247fa in ngx_event_expire_timers ()
#6 0x00000000004246ed in ngx_process_events_and_timers ()
#7 0x000000000042a048 in ngx_worker_process_cycle ()
#8 0x00000000004287e0 in ngx_spawn_process ()
#9 0x000000000042963c in ngx_start_worker_processes ()
#10 0x000000000042a5d5 in ngx_master_process_cycle ()
#11 0x0000000000410adf in main ()
I then tried tracing through the running worker using the GDB command "next", which said:
Single stepping until exit from function ngx_http_upstream_get_round_robin_peer
And never returned until I got fed up and broke it.
I finally reverted the patch and restarted the service, and continue to get this behavior. So my conclusion is that for my specific problem, this patch does not solve it.
--
Oded <oded at geek.co.il>
diff --git a/src/http/ngx_http_upstream_round_robin.c b/src/http/ngx_http_upstream_round_robin.c
--- a/src/http/ngx_http_upstream_round_robin.c
+++ b/src/http/ngx_http_upstream_round_robin.c
@@ -583,7 +583,7 @@ failed:
static ngx_uint_t
ngx_http_upstream_get_peer(ngx_http_upstream_rr_peers_t *peers)
{
- ngx_uint_t i, n;
+ ngx_uint_t i, n, reset = 0;
ngx_http_upstream_rr_peer_t *peer;
peer = &peers->peer[0];
@@ -622,6 +622,10 @@ ngx_http_upstream_get_peer(ngx_http_upst
return n;
}
+ if (reset++) {
+ return 0;
+ }
+
for (i = 0; i < peers->number; i++) {
peer[i].current_weight = peer[i].weight;
}
More information about the nginx-devel
mailing list