[PATCH] Fixed Nginx 1.1.1 eating 100% CPU time on occasions

Wed Aug 31 00:02:49 UTC 2011

Hello!

On Tue, Aug 30, 2011 at 09:22:20PM +0400, Artyom Gavrichenkov wrote:

> Problem:
> 
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 http upstream request:
> "/ximaera/images/whiting_buddh.png?"
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 http upstream process header
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 malloc: 0000000002239950:65536
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 recv: fd:252 0 of 65536
> 2011/08/30 19:35:05 [error] 3186#0: *5193 upstream prematurely closed
> connection while reading response header from upstream, client:
> 217.26.0.104, server: , request: "GET
> /ximaera/images/whiting_buddh.png HTTP/1.1", upstream:
> "http://192.168.1.5:80/ximaera/images/whiting_buddh.png", host:
> "www.ximaera.name", referrer: "http://twitter.com/"
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 http next upstream, 2
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 free keepalive peer
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 free rr peer 1 4
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 free rr peer failed: 0 -1
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 close http upstream connection: 252
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 event timer del: 252: 1314718565804
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 reusable connection: 0
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 get keepalive peer
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 get rr peer, try: 0
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 [XIMAERA] before
> round_robin.c:505, try: 0
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 [XIMAERA] before
> round_robin.c:508, try: 18446744073709551615
> 2011/08/30 19:35:05 [debug] 3186#0: *5193 [XIMAERA] before
> round_robin.c:514, try: 18446744073709551615
> 
> After unsuccessful attempt to
> ngx_http_upstream_free_round_robin_peer() we had (pc->tries == 0).
> Then we tried to ngx_http_upstream_get_round_robin_peer() with
> (pc->tries == 0 && rrp->peers->number == 1).
> On ngx_http_upstream_round_robin.c:505 we did pc->tries-- and started
> to decrement 0xffffffffffffffff to zero.

This is upstream keepalive related problem.

Normally connection should not be retried if peer.tries == 0, see 
ngx_http_upstream_next() in ngx_http_upstream.c.  But in case of 
errors on cached connection this check is bypassed and this causes 
cpu hog you observed.

I'll take a look how to fix this properly.  Though it looks like 
there is no easy fix, as e.g. in case of ip_hash we need to retry 
the same upstream server in such situations...

Maxim Dounin