upstream max_fails/fail_timeout logic?

Mon Feb 1 13:59:35 UTC 2016

Hello!

On Sat, Jan 30, 2016 at 05:31:14PM +0100, Thomas Nyberg wrote:

> Hello I've set up an http proxy to a couple of other servers and am using
> max_fails and fail_time in addition to having a proxy_read_timeout to force
> failover in case of a read timeout. It seems to work fine, but I have two
> questions.
> 
> 1) I'm not totally understanding the logic. I can tell that if the timeout
> hits the max number of times, it must sit out for the rest of the
> fail_timeout time and then it seems to start working again at the end of the
> time. But it also seems like it only needs to fail once (i.e. not a full set
> of max_fails) to be removed from consideration again. But then it seems like
> it doesn't fail again for a long time, it needs to fail max_fails times
> again. How does this logic work exactly?

After fail_timeout, one request will be passed to the server in 
question.  The server is considered again alive if the request 
succeeds.  If the request fails, nginx will wait for fail_timeout 
again.

Note that this is actually consistent with max_fails counting 
logic as well, as failures are actually counted not with 
fail_timeout sliding window, but within a session with 
fail_timeout timeout.  That is, fail_timeout defines minimal 
interval between failures for nginx to forget about previous 
failures.

E.g., with max_fails=5 fail_timeout=10s, if a server fails 1 
request each 5 seconds, it will be considered down after 5 
failures happened during previous 20 seconds.

> 2) Is the fact that an upstream server is taken down (in this temporary
> fashion) logged somewhere? I.e. some file where it just says "server hit max
> fails" or something?

In recent versions (1.9.1+) the "upstream server temporarily 
disabled" warning will be logged.

> 3) Extending 2), is there any way to "hook" into that server failure? I.e.
> if the server fails, is there a way with nginx to execute some sort of a
> program (either internal or external)?

No (except by monitoring logs).

Note well that "down" state is per worker (unless you are using 
upstream zone to share state between worker processes), and this 
also complicates things.
consider all 

In general it's a good idea to monitor backends separately, and 
don't expect nginx to do anything if a backend fails.

-- 
Maxim Dounin
http://nginx.org/