Question about failure and fail-over

Thu Jul 18 11:10:27 UTC 2013

Hi all, I have a general question about server failure and failover
within an upstream group to ensure I understand it correctly.

Lets say I have the configuration:

proxy_next_upstream timeout;
proxy_connect_timeout 5;
...
upstream {
  127.0.0.1 max_fails=3 fail_timeout=10s
  127.0.0.2 max_fails=3 fail_timeout=10s
  127.0.0.3 max_fails=3 fail_timeout=10s
}

And then the server 127.0.0.1 starts "hanging" indefinitely on
connection attempts.

a) Once 3 connection attempts timeout after 5 seconds on 127.0.0.1, it
will be marked down. However, during that 5 second timeout, it is
possible that 30, or N connections / requests may be in process of
timing out as well, so you may end up with 30 internal connection
failures as a result of 127.0.0.1's issue. Although they all are
retried on the next available upstream, 30 end-users noticed a 5
second hang in their request as a result of waiting for the timeout to
occur.

b) After 10 seconds, if the server is still hanging, a) basically
repeats in the same manner.

Is this correct? If I add "keepalive 64;" into the upstream block,
does the above scenario change? If a server is marked down as a result
of no new connections being able to connect, are all persistent
connections destroyed as well?

Any insight on this understanding would be appreciated.

Cheers,
Branden