upstream max_fails/fail_timeout logic?
mdounin at mdounin.ru
Mon Feb 1 13:59:35 UTC 2016
On Sat, Jan 30, 2016 at 05:31:14PM +0100, Thomas Nyberg wrote:
> Hello I've set up an http proxy to a couple of other servers and am using
> max_fails and fail_time in addition to having a proxy_read_timeout to force
> failover in case of a read timeout. It seems to work fine, but I have two
> 1) I'm not totally understanding the logic. I can tell that if the timeout
> hits the max number of times, it must sit out for the rest of the
> fail_timeout time and then it seems to start working again at the end of the
> time. But it also seems like it only needs to fail once (i.e. not a full set
> of max_fails) to be removed from consideration again. But then it seems like
> it doesn't fail again for a long time, it needs to fail max_fails times
> again. How does this logic work exactly?
After fail_timeout, one request will be passed to the server in
question. The server is considered again alive if the request
succeeds. If the request fails, nginx will wait for fail_timeout
Note that this is actually consistent with max_fails counting
logic as well, as failures are actually counted not with
fail_timeout sliding window, but within a session with
fail_timeout timeout. That is, fail_timeout defines minimal
interval between failures for nginx to forget about previous
E.g., with max_fails=5 fail_timeout=10s, if a server fails 1
request each 5 seconds, it will be considered down after 5
failures happened during previous 20 seconds.
> 2) Is the fact that an upstream server is taken down (in this temporary
> fashion) logged somewhere? I.e. some file where it just says "server hit max
> fails" or something?
In recent versions (1.9.1+) the "upstream server temporarily
disabled" warning will be logged.
> 3) Extending 2), is there any way to "hook" into that server failure? I.e.
> if the server fails, is there a way with nginx to execute some sort of a
> program (either internal or external)?
No (except by monitoring logs).
Note well that "down" state is per worker (unless you are using
upstream zone to share state between worker processes), and this
also complicates things.
In general it's a good idea to monitor backends separately, and
don't expect nginx to do anything if a backend fails.
More information about the nginx