[BUG?]fail_timeout/max_fails: code doesn't do what doc says

Maxim Dounin mdounin at mdounin.ru
Mon May 20 17:11:03 UTC 2013


Hello!

On Mon, May 20, 2013 at 01:34:26AM +0400, Dmitry Popov wrote:

> Hi.
> 
> http://wiki.nginx.org/HttpUpstreamModule says
> max_fails = NUMBER - number of unsuccessful attempts at communicating with the 
> server within the time period (assigned by parameter fail_timeout) after which 
> it is considered inoperative ...
> fail_timeout = TIME - the time during which must occur *max_fails* number of 
> unsuccessful attempts at communication with the server that would cause the 
> server to be considered inoperative ...
> 
> However, as we may see from code (ngx_http_upstream_get_peer and 
> ngx_http_upstream_free_round_robin_peer
> from src/http/ngx_http_upstream_round_robin.c) the logic is not as described:
> (simplified code)
> get_peer:
>   if (fails >= max_fails && now <= fail_timeout + checked)
>     skip
>   ...
>   checked = now
> free_peer:
>   if (request_failed)
>     fails++
>     accessed = now
>     checked = now
>   else
>     if (accessed < checked)
>       fails = 0
> 
> 1) So, fail_timeout is never used while peer is gaining fails (until 
> fails >= max_fails);
> 2) This algorithm always resets fails count if first request inside new second
> succeeds; it always increases fails count if first request fails. So, a lot
> depends on first (inside a second) request; I don't think it's a desired 
> behaviour.

This looks like a bug introduced in 1.3.0, thanks.

The peer->fails counter should be only reset after fail_timeout 
passed either since previous check or since previous failure.  
This was previously achieved by only bumping peer->checked once 
per fail_timeout, but was accedentally omitted in revision 
c90801720a0c.  Correct code is still here in the ip_hash balancer.

Patch:

diff --git a/src/http/modules/ngx_http_upstream_least_conn_module.c b/src/http/modules/ngx_http_upstream_least_conn_module.c
--- a/src/http/modules/ngx_http_upstream_least_conn_module.c
+++ b/src/http/modules/ngx_http_upstream_least_conn_module.c
@@ -282,7 +282,10 @@ ngx_http_upstream_get_least_conn_peer(ng
     }
 
     best->current_weight -= total;
-    best->checked = now;
+
+    if (now - best->checked > best->fail_timeout) {
+        best->checked = now;
+    }
 
     pc->sockaddr = best->sockaddr;
     pc->socklen = best->socklen;
diff --git a/src/http/ngx_http_upstream_round_robin.c b/src/http/ngx_http_upstream_round_robin.c
--- a/src/http/ngx_http_upstream_round_robin.c
+++ b/src/http/ngx_http_upstream_round_robin.c
@@ -523,7 +523,10 @@ ngx_http_upstream_get_peer(ngx_http_upst
     rrp->tried[n] |= m;
 
     best->current_weight -= total;
-    best->checked = now;
+
+    if (now - best->checked > best->fail_timeout) {
+        best->checked = now;
+    }
 
     return best;
 }

-- 
Maxim Dounin
http://nginx.org/en/donation.html



More information about the nginx-devel mailing list