[BUG?]fail_timeout/max_fails: code doesn't do what doc says
Maxim Dounin
mdounin at mdounin.ru
Mon May 20 17:11:03 UTC 2013
Hello!
On Mon, May 20, 2013 at 01:34:26AM +0400, Dmitry Popov wrote:
> Hi.
>
> http://wiki.nginx.org/HttpUpstreamModule says
> max_fails = NUMBER - number of unsuccessful attempts at communicating with the
> server within the time period (assigned by parameter fail_timeout) after which
> it is considered inoperative ...
> fail_timeout = TIME - the time during which must occur *max_fails* number of
> unsuccessful attempts at communication with the server that would cause the
> server to be considered inoperative ...
>
> However, as we may see from code (ngx_http_upstream_get_peer and
> ngx_http_upstream_free_round_robin_peer
> from src/http/ngx_http_upstream_round_robin.c) the logic is not as described:
> (simplified code)
> get_peer:
> if (fails >= max_fails && now <= fail_timeout + checked)
> skip
> ...
> checked = now
> free_peer:
> if (request_failed)
> fails++
> accessed = now
> checked = now
> else
> if (accessed < checked)
> fails = 0
>
> 1) So, fail_timeout is never used while peer is gaining fails (until
> fails >= max_fails);
> 2) This algorithm always resets fails count if first request inside new second
> succeeds; it always increases fails count if first request fails. So, a lot
> depends on first (inside a second) request; I don't think it's a desired
> behaviour.
This looks like a bug introduced in 1.3.0, thanks.
The peer->fails counter should be only reset after fail_timeout
passed either since previous check or since previous failure.
This was previously achieved by only bumping peer->checked once
per fail_timeout, but was accedentally omitted in revision
c90801720a0c. Correct code is still here in the ip_hash balancer.
Patch:
diff --git a/src/http/modules/ngx_http_upstream_least_conn_module.c b/src/http/modules/ngx_http_upstream_least_conn_module.c
--- a/src/http/modules/ngx_http_upstream_least_conn_module.c
+++ b/src/http/modules/ngx_http_upstream_least_conn_module.c
@@ -282,7 +282,10 @@ ngx_http_upstream_get_least_conn_peer(ng
}
best->current_weight -= total;
- best->checked = now;
+
+ if (now - best->checked > best->fail_timeout) {
+ best->checked = now;
+ }
pc->sockaddr = best->sockaddr;
pc->socklen = best->socklen;
diff --git a/src/http/ngx_http_upstream_round_robin.c b/src/http/ngx_http_upstream_round_robin.c
--- a/src/http/ngx_http_upstream_round_robin.c
+++ b/src/http/ngx_http_upstream_round_robin.c
@@ -523,7 +523,10 @@ ngx_http_upstream_get_peer(ngx_http_upst
rrp->tried[n] |= m;
best->current_weight -= total;
- best->checked = now;
+
+ if (now - best->checked > best->fail_timeout) {
+ best->checked = now;
+ }
return best;
}
--
Maxim Dounin
http://nginx.org/en/donation.html
More information about the nginx-devel
mailing list