proxy_upstream_next while no live upstreams

Maxim Dounin mdounin at mdounin.ru
Wed May 10 12:45:19 UTC 2017


Hello!

On Wed, May 10, 2017 at 04:26:06PM +0800, Wu Bingzheng wrote:

> I have an upstream configure with Nginx 1.8.1 :
> 
>     upstream test {
>       server 192.168.0.5;
>       server 192.168.0.6;
>     }
> 
> Question 1:
> Assume both of the 2 servers are down.
> First request tries both of them and fails, and response 502. 
> Nginx marks both of them as DOWN. This is OK.
> Second request comes and finds there is no live upstreams, then 
> Nginx resets both of servers as UP, logs "no live upstreams", 
> and returns 502.
> My question is that in the second request, nginx dose NOT try 
> the 2 servers, but just return 502 immediately. Is this in line 
> with expectations?

Yes, as long as all servers in an upstream group are already 
considered unavailable, nginx will return 502 without trying to 
connect them.

You may control when servers are considered unavailable using the 
max_fails and fail_timeout parameters of the server directives, 
see here:

http://nginx.org/en/docs/http/ngx_http_upstream_module.html#max_fails 

Note well that nginx versions before 1.11.5 reset all servers once 
all servers are unavailable, effectively returning just one 502 
per worker process.  Since nginx 1.11.5, it will wait for 
fail_timeout to expire:

    *) Change: now if there are no available servers in an upstream, nginx
       will not reset number of failures of all servers as it previously
       did, but will wait for fail_timeout to expire.

> Question 2: (not related with Question 1)
> In my production environment, 192.168.0.5 is UP, and 192.168.0.6 
> is DOWN.
> There are few access logs with $upstream_addr as "192.168.0.6, 
> test", and $status as 502.
> There were no error logs of connecting/reading 192.168.0.5 fails 
> which mean this server is UP, so I think the request should try 
> 192.168.0.5 after 192.168.0.6.
> But it does not try 192.168.0.5, and just log "no live upstream" 
> and return 502.
> The logs like this are very few, and I can not re-produce this 
> or debug it.
> I just ask it here in case someone else know the problem.

See above, this is exactly what is expected to happen when a 
request to upstream server fails.  The 502 / "no live upstream" 
you are seeing is a result of all servers considered unavailable.  
There are only few such errors as you are using nginx 1.8.1, which 
quickly resets failure counters of all servers in such situation.  
With recent nginx versions, 502 errors will be returned till 
fail_timeout expiration.

If you want nginx to completely ignore errors on the only working 
upstream server in your environment, consider using "server ... 
max_fails=0".  Alternatively, consider using fail_timeout which is 
appropriate for your environment.

-- 
Maxim Dounin
http://nginx.org/


More information about the nginx mailing list