incorrect upstream max_fails behaviour

Maxim Dounin mdounin at
Fri Mar 27 15:04:16 UTC 2020


On Thu, Mar 26, 2020 at 08:08:01PM +0100, Jan Prachař wrote:

> Hello,
> the upstream module documentation says:
> max_fails=number
> sets the number of unsuccessful attempts to communicate with the server
> that should happen in the duration set by the fail_timeout parameter to
> consider the server unavailable for a duration also set by the
> fail_timeout parameter.
> And also:
> fail_timeout=time
> sets
> the time during which the specified number of unsuccessful attempts to
> communicate with the server should happen to consider the server
> unavailable;
> Load balancing documentation at 
> says:
> The max_fails directive sets the number of consecutive unsuccessful
> attempts to communicate with the server that should happen during
> fail_timeout.
> But I have found that the actual nginx behaviour is different. Every
> time an upstream fails, peer->accessed and peer->checked is set to now
> and peers->fails is incremented. peer->checked is set to now also
> before connecting to upstream, if
> now - peer->checked > peer->fail_timeout. (1)
> peer->fails is set to 0 only for sucessful request if peer->accessed <
> peer->checked, which can happen only if condition (1) was fulfilled.
> Therefore, peers->fails is set to zero only if no upstream error
> happens during fail_timeout interval. So for example, if upstream fails
> once every fail_timeout, after max_fails*fail_timeout will be marked as
> unavailable.
> Or if there are no succesful requests to an upstream, peers->fails is
> incremented with every request independetly on fail_timeout settings.
> My test confirms that nginx indeed behaves like this.
> Is the documented behavior only part of the commercial subscription, or
> am I missing somthing?

Documentation somewhat oversimplifies things.  The fail_timeout 
setting is essentially a session timeout, and things work as 

1. As long as there are failures, the fails counter is 
incremented.  If fail_timeout passes since last failure, the fails 
counter is reset to 0 on the next successful request.

2. If the fails counter reaches max_fails, no more requests are 
routed to the peer for fail_timeout time.  After fail_timeout passes, 
one request is allowed.  If the request is successful, the fails 
counter is reset to 0, and further requests to the peer are 
allowed without any limits.

Maxim Dounin

More information about the nginx-devel mailing list