Strange $upstream_response_time latency spikes with reverse proxy

Maxim Dounin mdounin at mdounin.ru
Fri Mar 15 08:20:59 UTC 2013


Hello!

On Thu, Mar 14, 2013 at 07:07:20PM -0700, Jay Oster wrote:

[...]

> The access log has 10,000 lines total (i.e. two of these tests with 5,000
> concurrent connections), and when I sort by upstream_response_time, I get a
> log with the first 140 lines having about 1s on the upstream_response_time,
> and the remaining 9,860 lines show 700ms and less. Here's a snippet showing
> the strangeness, starting with line numbers:
> 
> 
> 1:     127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
> 200 19 "-" "ApacheBench/2.3" 1.027 1.026 234 83
> 2:     127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
> 200 19 "-" "ApacheBench/2.3" 1.027 1.026 234 83
> 3:     127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
> 200 19 "-" "ApacheBench/2.3" 1.026 1.025 234 83
> ...
> 138:   127.0.0.1 - - [14/Mar/2013:17:57:18 -0700] "GET /time/0 HTTP/1.0"
> 200 19 "-" "ApacheBench/2.3" 1.000 0.999 234 81
> 139:   127.0.0.1 - - [14/Mar/2013:17:57:18 -0700] "GET /time/0 HTTP/1.0"
> 200 19 "-" "ApacheBench/2.3" 0.999 0.999 234 81
> 140:   127.0.0.1 - - [14/Mar/2013:17:57:18 -0700] "GET /time/0 HTTP/1.0"
> 200 19 "-" "ApacheBench/2.3" 0.999 0.999 234 81
> 141:   127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
> 200 19 "-" "ApacheBench/2.3" 0.708 0.568 234 83
> 142:   127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
> 200 19 "-" "ApacheBench/2.3" 0.708 0.568 234 83
> 143:   127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
> 200 19 "-" "ApacheBench/2.3" 0.708 0.568 234 83
> ...
> 9998:  127.0.0.1 - - [14/Mar/2013:17:57:16 -0700] "GET /time/0 HTTP/1.0"
> 200 19 "-" "ApacheBench/2.3" 0.142 0.005 234 81
> 9999:  127.0.0.1 - - [14/Mar/2013:17:57:16 -0700] "GET /time/0 HTTP/1.0"
> 200 19 "-" "ApacheBench/2.3" 0.142 0.005 234 81
> 10000: 127.0.0.1 - - [14/Mar/2013:17:57:16 -0700] "GET /time/0 HTTP/1.0"
> 200 19 "-" "ApacheBench/2.3" 0.122 0.002 234 81
> 
> 
> 
> The upstream_response_time difference between line 140 and 141 is nearly
> 500ms! The total request_time also displays an interesting gap of almost
> 300ms. What's going on here?

I would suggests there are packet loss and retransmits for some 
reason.  Try tcpdump'ing traffic between nginx and backends to see 
what goes on in details.

-- 
Maxim Dounin
http://nginx.org/en/donation.html



More information about the nginx mailing list