Strange $upstream_response_time latency spikes with reverse proxy

Jason Oster jay at kodewerx.org
Sun Mar 17 09:23:20 UTC 2013


Hi again, Maxim!

On Mar 16, 2013, at 4:39 PM, Maxim Dounin <mdounin at mdounin.ru> wrote:

> Hello!
> 
> On Sat, Mar 16, 2013 at 01:37:22AM -0700, Jay Oster wrote:
> 
>> Hi Maxim,
>> 
>> Thanks for the suggestion! It looks like packet drop is the culprit here.
>> The initial SYN packet doesn't receive a corresponding SYN-ACK from the
>> upstream servers, so after a 1-second timeout (TCP Retransmission TimeOut),
>> the packet is retransmitted. The question is still *why* this only occurs
>> through nginx.
> 
> Have you tried looking on tcpdump on both backend and nginx host?  
> This might help to further narrow down the problem.

I haven't yet, but I will restart my investigation there on Monday. Capturing packets on both sides during the same time frame may reveal something I haven't seen yet.

> I could see two possible causes here:
> 
> 1) A trivial one.  Listen queue of your backend service is 
> exhausted, and the SYN packet is dropped due to this.  This can be 
> easily fixed by using bigger listen queue, and also easy enough to 
> track as there are listen queue overflow counters available in 
> most OSes.

Overflow queue is configured to 1024 on these hosts, though nothing changes when I increase it. I can however make the delay much longer by making the queue smaller.

> 2) Some other queue in the network stack is exhausted.  This might 
> be nontrivial to track (but usually possible too).

This is interesting, and could very well be it! Do you have any suggestions on where to start looking?

> 
>> To further narrow down the root cause, I moved my upstream server to the
>> same machine with nginx. The issue can still be replicated there. To
>> eliminate my upstream server as the cause (it's written in C with libevent,
>> by the way) I used the nodejs hello world demo; nodejs has trouble with the
>> 5,000 concurrent connections (go figure) but the connections that are
>> successful (without nginx reverse proxying) all complete in less than one
>> second. When I place nginx between ApacheBench and nodejs, that 1-second
>> TCP RTO shows up again.
>> 
>> To reiterate, this is all happening on a single machine; the TCP stack is
>> involved, but not a physical network. The only common denominator is nginx.
> 
> Use of nginx may result in another distribution of connection 
> attempts to a backend, resulting in bigger SYN packet bursts 
> (especially if you use settings like multi_accept).

Got it. I don't think multi_accept is being used (it's not in the nginx config).

Thank you.

> -- 
> Maxim Dounin
> http://nginx.org/en/donation.html
> 
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx



More information about the nginx mailing list