Configuration tuning for best performance with upstream nodes down

Wed Apr 9 17:30:59 MSD 2008

Hello!

On Wed, Apr 09, 2008 at 09:58:06AM +0200, denis wrote:

>Hi,
>
>I've had some experience with Nginx working as a reverse proxy
>loadbalancer only for multiple apache backends. For most of the time
>performance was excellent.
>
>However during a few situations where backends would go down and stay
>down for a longer period of time it would seem Nginx was not behaving
>ideally. It would give lots of error messages about upstream timeouts
>(expected), but it would take a little long time before switching to the
>next upstream (imho anything that is noticeable by the user is a bit too
>long?).
>
>http://wiki.codemongers.com/NginxHttpUpstreamModule#server
>Is a bit confusing to me; fail_timeout and max_fails seem to be the
>numbers to work with, but as it appears, tuning fail_timeout down would
>also mean that downed backends would be tried again faster?!

Yes.  And to make things even more clear: fail_timeout has nothing 
to do with upstream response time, it's just a time frame for 
nginx to remember fails.

It looks like you really want to tune how fails are detected.  
Take a look at ngx_http_proxy_module configuration, notably 
proxy_connect_timeout and proxy_read_timeout.

http://wiki.codemongers.com/NginxHttpProxyModule#proxy_connect_timeout

Note: proxy_connect_timeout defaults to 60s - its way too high for 
backends under you control.  I recommend tuning this to something 
like 2s.

Note: proxy_read_timeout defaults to 60s too, and depending on you 
task this may be either too high or too small.

There is also proxy_send_timeout, but normally it doesn't 
matter since most requests are small and will fit into socket 
buffers (so nginx can't distingush between send and read problems). 

Maxim Dounin

>Let us say the scenario has backends that normally respond very quickly
>(<2secs), how would one best tune the appropriate parameters?
>
>
>An alternate way of doing this which I actually considered at one point,
>would be to automate checks (better checks than pure network connections
>in this case) and then removing the nodes from the pools in config, and
>forcing an nginx config refresh..
>
>Hope someone can give examples of config that minimizes worsening the
>user experience in fail situations, or otherwise enlighten me ;)
>
>Regards
>--
>Denis
>
>
>