Strange $upstream_response_time latency spikes with reverse proxy

Andrew Alexeev andrew at nginx.com
Sat Mar 16 15:05:00 UTC 2013


Jay,

On Mar 16, 2013, at 12:37 PM, Jay Oster wrote:

> Hi Maxim,
> 
> Thanks for the suggestion! It looks like packet drop is the culprit here. The initial SYN packet doesn't receive a corresponding SYN-ACK from the upstream servers, so after a 1-second timeout (TCP Retransmission TimeOut), the packet is retransmitted. The question is still *why* this only occurs through nginx.
> 
> To further narrow down the root cause, I moved my upstream server to the same machine with nginx. The issue can still be replicated there. To eliminate my upstream server as the cause (it's written in C with libevent, by the way) I used the nodejs hello world demo; nodejs has trouble with the 5,000 concurrent connections (go figure) but the connections that are successful (without nginx reverse proxying) all complete in less than one second. When I place nginx between ApacheBench and nodejs, that 1-second TCP RTO shows up again.

You mean you keep seeing SYN-ACK loss through loopback?

That might sound funny, but what's the OS and the overall environment of that strangely behaving machine with nginx? Is it a virtualized one? Is the other machine any different? The more details you can provide, the better :)

> To reiterate, this is all happening on a single machine; the TCP stack is involved, but not a physical network. The only common denominator is nginx.

Can you try the same tests on the other machine, where you originally didn't have any problems with your application? That is, can you repeat nginx+app on the other machine and see if the above strange behavior persists?

> On Fri, Mar 15, 2013 at 1:20 AM, Maxim Dounin <mdounin at mdounin.ru> wrote:
> Hello!
> 
> On Thu, Mar 14, 2013 at 07:07:20PM -0700, Jay Oster wrote:
> 
> [...]
> 
> > The access log has 10,000 lines total (i.e. two of these tests with 5,000
> > concurrent connections), and when I sort by upstream_response_time, I get a
> > log with the first 140 lines having about 1s on the upstream_response_time,
> > and the remaining 9,860 lines show 700ms and less. Here's a snippet showing
> > the strangeness, starting with line numbers:
> >
> >
> > 1:     127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
> > 200 19 "-" "ApacheBench/2.3" 1.027 1.026 234 83
> > 2:     127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
> > 200 19 "-" "ApacheBench/2.3" 1.027 1.026 234 83
> > 3:     127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
> > 200 19 "-" "ApacheBench/2.3" 1.026 1.025 234 83
> > ...
> > 138:   127.0.0.1 - - [14/Mar/2013:17:57:18 -0700] "GET /time/0 HTTP/1.0"
> > 200 19 "-" "ApacheBench/2.3" 1.000 0.999 234 81
> > 139:   127.0.0.1 - - [14/Mar/2013:17:57:18 -0700] "GET /time/0 HTTP/1.0"
> > 200 19 "-" "ApacheBench/2.3" 0.999 0.999 234 81
> > 140:   127.0.0.1 - - [14/Mar/2013:17:57:18 -0700] "GET /time/0 HTTP/1.0"
> > 200 19 "-" "ApacheBench/2.3" 0.999 0.999 234 81
> > 141:   127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
> > 200 19 "-" "ApacheBench/2.3" 0.708 0.568 234 83
> > 142:   127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
> > 200 19 "-" "ApacheBench/2.3" 0.708 0.568 234 83
> > 143:   127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
> > 200 19 "-" "ApacheBench/2.3" 0.708 0.568 234 83
> > ...
> > 9998:  127.0.0.1 - - [14/Mar/2013:17:57:16 -0700] "GET /time/0 HTTP/1.0"
> > 200 19 "-" "ApacheBench/2.3" 0.142 0.005 234 81
> > 9999:  127.0.0.1 - - [14/Mar/2013:17:57:16 -0700] "GET /time/0 HTTP/1.0"
> > 200 19 "-" "ApacheBench/2.3" 0.142 0.005 234 81
> > 10000: 127.0.0.1 - - [14/Mar/2013:17:57:16 -0700] "GET /time/0 HTTP/1.0"
> > 200 19 "-" "ApacheBench/2.3" 0.122 0.002 234 81
> >
> >
> >
> > The upstream_response_time difference between line 140 and 141 is nearly
> > 500ms! The total request_time also displays an interesting gap of almost
> > 300ms. What's going on here?
> 
> I would suggests there are packet loss and retransmits for some
> reason.  Try tcpdump'ing traffic between nginx and backends to see
> what goes on in details.
> 
> --
> Maxim Dounin
> http://nginx.org/en/donation.html
> 
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
> 
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20130316/4d5af508/attachment.html>


More information about the nginx mailing list