Nginx as Load Balancer Connection Issues
gtuhl
nginx-forum at nginx.us
Mon Jan 23 23:00:20 UTC 2012
gtuhl Wrote:
-------------------------------------------------------
> Initially we were seeing a ton of "connect()
> failed (110: Connection timed out)", 1 every
> couple seconds. I added these to sysctl.conf and
> that seemed to solve the problem:
>
> net.ipv4.tcp_syncookies = 1
> net.ipv4.tcp_fin_timeout = 20
> net.ipv4.tcp_max_syn_backlog = 20480
> net.core.netdev_max_backlog = 4096
> net.ipv4.tcp_max_tw_buckets = 400000
> net.core.somaxconn = 4096
>
> Now things generally run fine but every once in
> awhile we get a huge burst of "upstream
> prematurely closed connection while reading
> response header from upstream" followed by a "no
> live upstreams". Again, no apparent load on the
> machines involved. These bursts only last a
> minute or so. We also still get an occasional
> "connect() failed (110: Connection timed out)" but
> they are far less frequent, perhaps 1 or 2 per
> hour.
>
On looking at this again recently, we made two adjustments that
eliminated the connection issues completely:
net.nf_conntrack_max = 262144
net.ipv4.ip_local_port_range = 1024 65000
After making those two changes things became quite stable. However, we
still have massive numbers of TIME_WAIT connections both on the nginx
machine and on the upstream apache machines.
The nginx machine is accepting roughly 1000 requests/s, and has 40,000
connections in TIME_WAIT.
The apache machines are each accepting roughly 250 requests/s, and have
15,000 connections in TIME_WAIT.
We tried setting net.ipv4.tcp_tw_reuse to 1 and restarting networking.
That did not cause any trouble, but also didn't drop the TIME_WAIT
count. I have read that net.ipv4.tcp_tw_recycle is dangerous but we may
try that if others have had good experiences.
Is there a way to have these cleaned up more quickly? My concern is
that even with the expanded ip_local_port_range 40k is cutting it rather
close. Before we bumped ip_local_port_range the whole system was
falling down right as the TIME_WAIT count approached 32k. Is it normal
for nginx to cause this many TIME_WAIT connections? If we're only doing
1k requests/s and nearly exhausting the available port range what would
sites with heavier volume do?
Posted at Nginx Forum: http://forum.nginx.org/read.php?2,220894,221550#msg-221550
More information about the nginx
mailing list