incorrect upstream max_fails behaviour

Maxim Dounin mdounin at mdounin.ru
Fri Mar 27 15:04:16 UTC 2020


Hello!

On Thu, Mar 26, 2020 at 08:08:01PM +0100, Jan Prachař wrote:

> Hello,
> 
> the upstream module documentation says:
> 
> 
> max_fails=number
> sets the number of unsuccessful attempts to communicate with the server
> that should happen in the duration set by the fail_timeout parameter to
> consider the server unavailable for a duration also set by the
> fail_timeout parameter.
> 
> 
> And also:
> 
> 
> fail_timeout=time
> sets
> the time during which the specified number of unsuccessful attempts to
> communicate with the server should happen to consider the server
> unavailable;
> 
> 
> Load balancing documentation at 
> http://nginx.org/en/docs/http/load_balancing.html says:
> 
> 
> The max_fails directive sets the number of consecutive unsuccessful
> attempts to communicate with the server that should happen during
> fail_timeout.
> 
> 
> But I have found that the actual nginx behaviour is different. Every
> time an upstream fails, peer->accessed and peer->checked is set to now
> and peers->fails is incremented. peer->checked is set to now also
> before connecting to upstream, if
> 
> now - peer->checked > peer->fail_timeout. (1)
> 
> peer->fails is set to 0 only for sucessful request if peer->accessed <
> peer->checked, which can happen only if condition (1) was fulfilled.
> Therefore, peers->fails is set to zero only if no upstream error
> happens during fail_timeout interval. So for example, if upstream fails
> once every fail_timeout, after max_fails*fail_timeout will be marked as
> unavailable.
> 
> Or if there are no succesful requests to an upstream, peers->fails is
> incremented with every request independetly on fail_timeout settings.
> My test confirms that nginx indeed behaves like this.
> 
> Is the documented behavior only part of the commercial subscription, or
> am I missing somthing?

Documentation somewhat oversimplifies things.  The fail_timeout 
setting is essentially a session timeout, and things work as 
follows:

1. As long as there are failures, the fails counter is 
incremented.  If fail_timeout passes since last failure, the fails 
counter is reset to 0 on the next successful request.

2. If the fails counter reaches max_fails, no more requests are 
routed to the peer for fail_timeout time.  After fail_timeout passes, 
one request is allowed.  If the request is successful, the fails 
counter is reset to 0, and further requests to the peer are 
allowed without any limits.

-- 
Maxim Dounin
http://mdounin.ru/


More information about the nginx-devel mailing list