incorrect upstream max_fails behaviour
Maxim Dounin
mdounin at mdounin.ru
Fri Mar 27 15:04:16 UTC 2020
Hello!
On Thu, Mar 26, 2020 at 08:08:01PM +0100, Jan Prachař wrote:
> Hello,
>
> the upstream module documentation says:
>
>
> max_fails=number
> sets the number of unsuccessful attempts to communicate with the server
> that should happen in the duration set by the fail_timeout parameter to
> consider the server unavailable for a duration also set by the
> fail_timeout parameter.
>
>
> And also:
>
>
> fail_timeout=time
> sets
> the time during which the specified number of unsuccessful attempts to
> communicate with the server should happen to consider the server
> unavailable;
>
>
> Load balancing documentation at
> http://nginx.org/en/docs/http/load_balancing.html says:
>
>
> The max_fails directive sets the number of consecutive unsuccessful
> attempts to communicate with the server that should happen during
> fail_timeout.
>
>
> But I have found that the actual nginx behaviour is different. Every
> time an upstream fails, peer->accessed and peer->checked is set to now
> and peers->fails is incremented. peer->checked is set to now also
> before connecting to upstream, if
>
> now - peer->checked > peer->fail_timeout. (1)
>
> peer->fails is set to 0 only for sucessful request if peer->accessed <
> peer->checked, which can happen only if condition (1) was fulfilled.
> Therefore, peers->fails is set to zero only if no upstream error
> happens during fail_timeout interval. So for example, if upstream fails
> once every fail_timeout, after max_fails*fail_timeout will be marked as
> unavailable.
>
> Or if there are no succesful requests to an upstream, peers->fails is
> incremented with every request independetly on fail_timeout settings.
> My test confirms that nginx indeed behaves like this.
>
> Is the documented behavior only part of the commercial subscription, or
> am I missing somthing?
Documentation somewhat oversimplifies things. The fail_timeout
setting is essentially a session timeout, and things work as
follows:
1. As long as there are failures, the fails counter is
incremented. If fail_timeout passes since last failure, the fails
counter is reset to 0 on the next successful request.
2. If the fails counter reaches max_fails, no more requests are
routed to the peer for fail_timeout time. After fail_timeout passes,
one request is allowed. If the request is successful, the fails
counter is reset to 0, and further requests to the peer are
allowed without any limits.
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx-devel
mailing list