Strange $upstream_response_time latency spikes with reverse proxy

Sat Mar 16 23:39:17 UTC 2013

Hello!

On Sat, Mar 16, 2013 at 01:37:22AM -0700, Jay Oster wrote:

> Hi Maxim,
> 
> Thanks for the suggestion! It looks like packet drop is the culprit here.
> The initial SYN packet doesn't receive a corresponding SYN-ACK from the
> upstream servers, so after a 1-second timeout (TCP Retransmission TimeOut),
> the packet is retransmitted. The question is still *why* this only occurs
> through nginx.

Have you tried looking on tcpdump on both backend and nginx host?  
This might help to further narrow down the problem.

I could see two possible causes here:

1) A trivial one.  Listen queue of your backend service is 
exhausted, and the SYN packet is dropped due to this.  This can be 
easily fixed by using bigger listen queue, and also easy enough to 
track as there are listen queue overflow counters available in 
most OSes.

2) Some other queue in the network stack is exhausted.  This might 
be nontrivial to track (but usually possible too).

> To further narrow down the root cause, I moved my upstream server to the
> same machine with nginx. The issue can still be replicated there. To
> eliminate my upstream server as the cause (it's written in C with libevent,
> by the way) I used the nodejs hello world demo; nodejs has trouble with the
> 5,000 concurrent connections (go figure) but the connections that are
> successful (without nginx reverse proxying) all complete in less than one
> second. When I place nginx between ApacheBench and nodejs, that 1-second
> TCP RTO shows up again.
> 
> To reiterate, this is all happening on a single machine; the TCP stack is
> involved, but not a physical network. The only common denominator is nginx.

Use of nginx may result in another distribution of connection 
attempts to a backend, resulting in bigger SYN packet bursts 
(especially if you use settings like multi_accept).

-- 
Maxim Dounin
http://nginx.org/en/donation.html