[BUG?]fail_timeout/max_fails: code doesn't do what doc says
Dmitry Popov
dp at highloadlab.com
Sun May 19 21:34:26 UTC 2013
Hi.
http://wiki.nginx.org/HttpUpstreamModule says
max_fails = NUMBER - number of unsuccessful attempts at communicating with the
server within the time period (assigned by parameter fail_timeout) after which
it is considered inoperative ...
fail_timeout = TIME - the time during which must occur *max_fails* number of
unsuccessful attempts at communication with the server that would cause the
server to be considered inoperative ...
However, as we may see from code (ngx_http_upstream_get_peer and
ngx_http_upstream_free_round_robin_peer
from src/http/ngx_http_upstream_round_robin.c) the logic is not as described:
(simplified code)
get_peer:
if (fails >= max_fails && now <= fail_timeout + checked)
skip
...
checked = now
free_peer:
if (request_failed)
fails++
accessed = now
checked = now
else
if (accessed < checked)
fails = 0
1) So, fail_timeout is never used while peer is gaining fails (until
fails >= max_fails);
2) This algorithm always resets fails count if first request inside new second
succeeds; it always increases fails count if first request fails. So, a lot
depends on first (inside a second) request; I don't think it's a desired
behaviour.
3) I'm not sure if "accessed" is a good name for a field that contains last
fail timestamp.
I don't know where an error is (in doc or code) and I don't know how you (nginx
devs) wanted it to work so I don't have any constructive ideas, sorry.
--
Dmitry Popov
Highloadlab
More information about the nginx-devel
mailing list