incorrect upstream max_fails behaviour

Thu Mar 26 19:08:01 UTC 2020

Hello,

the upstream module documentation says:

max_fails=number
sets the number of unsuccessful attempts to communicate with the server
that should happen in the duration set by the fail_timeout parameter to
consider the server unavailable for a duration also set by the
fail_timeout parameter.

And also:

fail_timeout=time
sets
the time during which the specified number of unsuccessful attempts to
communicate with the server should happen to consider the server
unavailable;

Load balancing documentation at 
http://nginx.org/en/docs/http/load_balancing.html says:

The max_fails directive sets the number of consecutive unsuccessful
attempts to communicate with the server that should happen during
fail_timeout.

But I have found that the actual nginx behaviour is different. Every
time an upstream fails, peer->accessed and peer->checked is set to now
and peers->fails is incremented. peer->checked is set to now also
before connecting to upstream, if

now - peer->checked > peer->fail_timeout. (1)

peer->fails is set to 0 only for sucessful request if peer->accessed <
peer->checked, which can happen only if condition (1) was fulfilled.
Therefore, peers->fails is set to zero only if no upstream error
happens during fail_timeout interval. So for example, if upstream fails
once every fail_timeout, after max_fails*fail_timeout will be marked as
unavailable.

Or if there are no succesful requests to an upstream, peers->fails is
incremented with every request independetly on fail_timeout settings.
My test confirms that nginx indeed behaves like this.

Is the documented behavior only part of the commercial subscription, or
am I missing somthing?

Best regards,
Jan Prachař