Strange $upstream_response_time latency spikes with reverse proxy
jay at kodewerx.org
Sun Mar 17 09:23:20 UTC 2013
Hi again, Maxim!
On Mar 16, 2013, at 4:39 PM, Maxim Dounin <mdounin at mdounin.ru> wrote:
> On Sat, Mar 16, 2013 at 01:37:22AM -0700, Jay Oster wrote:
>> Hi Maxim,
>> Thanks for the suggestion! It looks like packet drop is the culprit here.
>> The initial SYN packet doesn't receive a corresponding SYN-ACK from the
>> upstream servers, so after a 1-second timeout (TCP Retransmission TimeOut),
>> the packet is retransmitted. The question is still *why* this only occurs
>> through nginx.
> Have you tried looking on tcpdump on both backend and nginx host?
> This might help to further narrow down the problem.
I haven't yet, but I will restart my investigation there on Monday. Capturing packets on both sides during the same time frame may reveal something I haven't seen yet.
> I could see two possible causes here:
> 1) A trivial one. Listen queue of your backend service is
> exhausted, and the SYN packet is dropped due to this. This can be
> easily fixed by using bigger listen queue, and also easy enough to
> track as there are listen queue overflow counters available in
> most OSes.
Overflow queue is configured to 1024 on these hosts, though nothing changes when I increase it. I can however make the delay much longer by making the queue smaller.
> 2) Some other queue in the network stack is exhausted. This might
> be nontrivial to track (but usually possible too).
This is interesting, and could very well be it! Do you have any suggestions on where to start looking?
>> To further narrow down the root cause, I moved my upstream server to the
>> same machine with nginx. The issue can still be replicated there. To
>> eliminate my upstream server as the cause (it's written in C with libevent,
>> by the way) I used the nodejs hello world demo; nodejs has trouble with the
>> 5,000 concurrent connections (go figure) but the connections that are
>> successful (without nginx reverse proxying) all complete in less than one
>> second. When I place nginx between ApacheBench and nodejs, that 1-second
>> TCP RTO shows up again.
>> To reiterate, this is all happening on a single machine; the TCP stack is
>> involved, but not a physical network. The only common denominator is nginx.
> Use of nginx may result in another distribution of connection
> attempts to a backend, resulting in bigger SYN packet bursts
> (especially if you use settings like multi_accept).
Got it. I don't think multi_accept is being used (it's not in the nginx config).
> Maxim Dounin
> nginx mailing list
> nginx at nginx.org
More information about the nginx