<div dir="ltr"><div>Hi list!</div><div><br></div><div>I'm trying to debug an interesting problem where we randomly get a "high latency" response time from our upstream servers. It appears to occur in about 1.5% of all requests. Here's a description of the tests I've been running to isolate the problem within nginx:</div>
<div><br></div><div>I'm using an endpoint on the upstream servers that operates extremely quickly; a request which only responds back with the server's current local UNIX timestamp. From the nginx server, I start ApacheBench with 5,000 concurrent connections directly to the upstream server (bypassing nginx). Here is what a typical run of this test looks like:</div>
<div><br></div><div><br></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><font face="courier new, monospace" size="1">Document Path: /time/0</font></div><div style><font face="courier new, monospace" size="1">Document Length: 19 bytes</font></div>
<div style><font face="courier new, monospace" size="1"><br></font></div><div style><font face="courier new, monospace" size="1">Concurrency Level: 5000</font></div><div style><font face="courier new, monospace" size="1">Time taken for tests: 0.756 seconds</font></div>
<div style><font face="courier new, monospace" size="1">Complete requests: 5000</font></div><div style><font face="courier new, monospace" size="1">Failed requests: 0</font></div><div style><font face="courier new, monospace" size="1">Write errors: 0</font></div>
<div style><font face="courier new, monospace" size="1">Total transferred: 1110000 bytes</font></div><div style><font face="courier new, monospace" size="1">HTML transferred: 95000 bytes</font></div><div style>
<font face="courier new, monospace" size="1">Requests per second: 6617.33 [#/sec] (mean)</font></div><div style><font face="courier new, monospace" size="1">Time per request: 755.592 [ms] (mean)</font></div><div style>
<font face="courier new, monospace" size="1">Time per request: 0.151 [ms] (mean, across all concurrent requests)</font></div><div style><font face="courier new, monospace" size="1">Transfer rate: 1434.62 [Kbytes/sec] received</font></div>
<div style><font face="courier new, monospace" size="1"><br></font></div><div style><font face="courier new, monospace" size="1">Connection Times (ms)</font></div><div style><font face="courier new, monospace" size="1"> min mean[+/-sd] median max</font></div>
<div style><font face="courier new, monospace" size="1">Connect: 4 63 53.7 35 167</font></div><div style><font face="courier new, monospace" size="1">Processing: 22 44 19.1 38 249</font></div>
<div style><font face="courier new, monospace" size="1">Waiting: 17 35 18.8 30 243</font></div><div style><font face="courier new, monospace" size="1">Total: 55 107 64.4 73 401</font></div>
<div style><font face="courier new, monospace" size="1"><br></font></div><div style><font face="courier new, monospace" size="1">Percentage of the requests served within a certain time (ms)</font></div><div style><font face="courier new, monospace" size="1"> 50% 73</font></div>
<div style><font face="courier new, monospace" size="1"> 66% 77</font></div><div style><font face="courier new, monospace" size="1"> 75% 84</font></div><div style><font face="courier new, monospace" size="1"> 80% 202</font></div>
<div style><font face="courier new, monospace" size="1"> 90% 222</font></div><div style><font face="courier new, monospace" size="1"> 95% 231</font></div><div style><font face="courier new, monospace" size="1"> 98% 250</font></div>
<div style><font face="courier new, monospace" size="1"> 99% 251</font></div><div style><font face="courier new, monospace" size="1"> 100% 401 (longest request)</font></div></blockquote><div><br></div><div><br></div>
<div>And here's the same test with the longest response times I've seen:</div><div><br></div><div><br></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><font face="courier new, monospace" size="1">Document Path: /time/0</font></div>
<div style><font face="courier new, monospace" size="1">Document Length: 19 bytes</font></div><div style><font face="courier new, monospace" size="1"><br></font></div><div style><font face="courier new, monospace" size="1">Concurrency Level: 5000</font></div>
<div style><font face="courier new, monospace" size="1">Time taken for tests: 0.807 seconds</font></div><div style><font face="courier new, monospace" size="1">Complete requests: 5000</font></div><div style><font face="courier new, monospace" size="1">Failed requests: 0</font></div>
<div style><font face="courier new, monospace" size="1">Write errors: 0</font></div><div style><font face="courier new, monospace" size="1">Total transferred: 1110000 bytes</font></div><div style><font face="courier new, monospace" size="1">HTML transferred: 95000 bytes</font></div>
<div style><font face="courier new, monospace" size="1">Requests per second: 6197.08 [#/sec] (mean)</font></div><div style><font face="courier new, monospace" size="1">Time per request: 806.831 [ms] (mean)</font></div>
<div style><font face="courier new, monospace" size="1">Time per request: 0.161 [ms] (mean, across all concurrent requests)</font></div><div style><font face="courier new, monospace" size="1">Transfer rate: 1343.51 [Kbytes/sec] received</font></div>
<div style><font face="courier new, monospace" size="1"><br></font></div><div style><font face="courier new, monospace" size="1">Connection Times (ms)</font></div><div style><font face="courier new, monospace" size="1"> min mean[+/-sd] median max</font></div>
<div style><font face="courier new, monospace" size="1">Connect: 3 45 51.8 17 144</font></div><div style><font face="courier new, monospace" size="1">Processing: 10 29 24.4 23 623</font></div>
<div style><font face="courier new, monospace" size="1">Waiting: 9 25 24.4 18 623</font></div><div style><font face="courier new, monospace" size="1">Total: 26 75 67.4 39 626</font></div>
<div style><font face="courier new, monospace" size="1"><br></font></div><div style><font face="courier new, monospace" size="1">Percentage of the requests served within a certain time (ms)</font></div><div style><font face="courier new, monospace" size="1"> 50% 39</font></div>
<div style><font face="courier new, monospace" size="1"> 66% 42</font></div><div style><font face="courier new, monospace" size="1"> 75% 45</font></div><div style><font face="courier new, monospace" size="1"> 80% 173</font></div>
<div style><font face="courier new, monospace" size="1"> 90% 190</font></div><div style><font face="courier new, monospace" size="1"> 95% 199</font></div><div style><font face="courier new, monospace" size="1"> 98% 217</font></div>
<div style><font face="courier new, monospace" size="1"> 99% 218</font></div><div style><font face="courier new, monospace" size="1"> 100% 626 (longest request)</font></div></blockquote><div><br></div><div><br></div>
<div>Not bad. Now, keep in mind, this is a SINGLE upstream server handling these requests over the network. Once I change my test to point ab at the local nginx, the strange latency issue rears its ugly head. I have 4 upstream servers in my config.</div>
<div><br></div><div>Here's what the same test through nginx looks like:</div><div><br></div><div><br></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><font face="courier new, monospace" size="1">Concurrency Level: 5000</font></div>
<div style><font face="courier new, monospace" size="1">Time taken for tests: 1.602 seconds</font></div><div style><font face="courier new, monospace" size="1">Complete requests: 5000</font></div><div style><font face="courier new, monospace" size="1">Failed requests: 0</font></div>
<div style><font face="courier new, monospace" size="1">Write errors: 0</font></div><div style><font face="courier new, monospace" size="1">Total transferred: 1170000 bytes</font></div><div style><font face="courier new, monospace" size="1">HTML transferred: 95000 bytes</font></div>
<div style><font face="courier new, monospace" size="1">Requests per second: 3121.08 [#/sec] (mean)</font></div><div style><font face="courier new, monospace" size="1">Time per request: 1602.012 [ms] (mean)</font></div>
<div style><font face="courier new, monospace" size="1">Time per request: 0.320 [ms] (mean, across all concurrent requests)</font></div><div style><font face="courier new, monospace" size="1">Transfer rate: 713.21 [Kbytes/sec] received</font></div>
<div style><font face="courier new, monospace" size="1"><br></font></div><div style><font face="courier new, monospace" size="1">Connection Times (ms)</font></div><div style><font face="courier new, monospace" size="1"> min mean[+/-sd] median max</font></div>
<div style><font face="courier new, monospace" size="1">Connect: 109 172 39.4 167 246</font></div><div style><font face="courier new, monospace" size="1">Processing: 106 505 143.3 530 1248</font></div>
<div style><font face="courier new, monospace" size="1">Waiting: 103 504 143.5 530 1248</font></div><div style><font face="courier new, monospace" size="1">Total: 344 677 108.6 696 1358</font></div>
<div style><font face="courier new, monospace" size="1"><br></font></div><div style><font face="courier new, monospace" size="1">Percentage of the requests served within a certain time (ms)</font></div><div style><font face="courier new, monospace" size="1"> 50% 696</font></div>
<div style><font face="courier new, monospace" size="1"> 66% 723</font></div><div style><font face="courier new, monospace" size="1"> 75% 741</font></div><div style><font face="courier new, monospace" size="1"> 80% 752</font></div>
<div style><font face="courier new, monospace" size="1"> 90% 768</font></div><div style><font face="courier new, monospace" size="1"> 95% 779</font></div><div style><font face="courier new, monospace" size="1"> 98% 786</font></div>
<div style><font face="courier new, monospace" size="1"> 99% 788</font></div><div style><font face="courier new, monospace" size="1"> 100% 1358 (longest request)</font></div></blockquote><div><br></div><div><br></div>
<div>Ack! It's like nginx decides to drop an extra second on some requests for no reason.</div><div><br></div><div>I've also recorded these test runs with nginx's access log. Here's the log format, first:</div>
<div><br></div><div><br></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><font face="courier new, monospace" size="1"> log_format main '$remote_addr - - ' ## User's IP Address</font></div>
<div style><font face="courier new, monospace" size="1"> '[$time_local] ' ## DateTime</font></div><div style><font face="courier new, monospace" size="1"> '"$request" ' ## User's Request URL</font></div>
<div style><font face="courier new, monospace" size="1"> '$status ' ## HTTP Code</font></div><div style><font face="courier new, monospace" size="1"> '$body_bytes_sent ' ## Bytes BODY ONLY</font></div>
<div style><font face="courier new, monospace" size="1"> '"$http_referer" ' ## User's Referer</font></div><div style><font face="courier new, monospace" size="1"> '"$http_user_agent" ' ## User's Agent</font></div>
<div style><font face="courier new, monospace" size="1"> '$request_time ' ## NginX Response</font></div><div style><font face="courier new, monospace" size="1"> '$upstream_response_time ' ## Upstream Response</font></div>
<div style><font face="courier new, monospace" size="1"> '$bytes_sent ' ## Bytes Sent (GZIP)</font></div><div style><font face="courier new, monospace" size="1"> '$request_length'; ## String Length</font></div>
</blockquote><div><br></div><div><br></div><div>The access log has 10,000 lines total (i.e. two of these tests with 5,000 concurrent connections), and when I sort by upstream_response_time, I get a log with the first 140 lines having about 1s on the upstream_response_time, and the remaining 9,860 lines show 700ms and less. Here's a snippet showing the strangeness, starting with line numbers:</div>
<div><br></div><div><br></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><font face="courier new, monospace" size="1">1: 127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0" 200 19 "-" "ApacheBench/2.3" 1.027 1.026 234 83</font></div>
<div style><font face="courier new, monospace" size="1">2: 127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0" 200 19 "-" "ApacheBench/2.3" 1.027 1.026 234 83</font></div><div style>
<font face="courier new, monospace" size="1">3: 127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0" 200 19 "-" "ApacheBench/2.3" 1.026 1.025 234 83</font></div><div style><font face="courier new, monospace" size="1">...</font></div>
<div style><font face="courier new, monospace" size="1">138: 127.0.0.1 - - [14/Mar/2013:17:57:18 -0700] "GET /time/0 HTTP/1.0" 200 19 "-" "ApacheBench/2.3" 1.000 0.999 234 81</font></div><div style>
<font face="courier new, monospace" size="1">139: 127.0.0.1 - - [14/Mar/2013:17:57:18 -0700] "GET /time/0 HTTP/1.0" 200 19 "-" "ApacheBench/2.3" 0.999 0.999 234 81</font></div><div style><font face="courier new, monospace" size="1">140: 127.0.0.1 - - [14/Mar/2013:17:57:18 -0700] "GET /time/0 HTTP/1.0" 200 19 "-" "ApacheBench/2.3" 0.999 0.999 234 81</font></div>
<div style><font face="courier new, monospace" size="1">141: 127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0" 200 19 "-" "ApacheBench/2.3" 0.708 0.568 234 83</font></div><div style>
<font face="courier new, monospace" size="1">142: 127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0" 200 19 "-" "ApacheBench/2.3" 0.708 0.568 234 83</font></div><div style><font face="courier new, monospace" size="1">143: 127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0" 200 19 "-" "ApacheBench/2.3" 0.708 0.568 234 83</font></div>
<div style><font face="courier new, monospace" size="1">...</font></div><div style><font face="courier new, monospace" size="1">9998: 127.0.0.1 - - [14/Mar/2013:17:57:16 -0700] "GET /time/0 HTTP/1.0" 200 19 "-" "ApacheBench/2.3" 0.142 0.005 234 81</font></div>
<div style><font face="courier new, monospace" size="1">9999: 127.0.0.1 - - [14/Mar/2013:17:57:16 -0700] "GET /time/0 HTTP/1.0" 200 19 "-" "ApacheBench/2.3" 0.142 0.005 234 81</font></div><div style>
<font face="courier new, monospace" size="1">10000: 127.0.0.1 - - [14/Mar/2013:17:57:16 -0700] "GET /time/0 HTTP/1.0" 200 19 "-" "ApacheBench/2.3" 0.122 0.002 234 81</font></div></blockquote>
<div><br></div><div><br></div><div>The upstream_response_time difference between line 140 and 141 is nearly 500ms! The total request_time also displays an interesting gap of almost 300ms. What's going on here?</div><div>
<br></div><div>The kernels have been tuned on all servers for a high number of open files, and tcp buffers:</div><div><br></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><div style><font face="courier new, monospace" size="1">$ ulimit -a</font></div>
</div><div style><div style><font face="courier new, monospace" size="1">core file size (blocks, -c) unlimited</font></div></div><div style><div style><font face="courier new, monospace" size="1">data seg size (kbytes, -d) unlimited</font></div>
</div><div style><div style><font face="courier new, monospace" size="1">scheduling priority (-e) 0</font></div></div><div style><div style><font face="courier new, monospace" size="1">file size (blocks, -f) unlimited</font></div>
</div><div style><div style><font face="courier new, monospace" size="1">pending signals (-i) 119715</font></div></div><div style><div style><font face="courier new, monospace" size="1">max locked memory (kbytes, -l) 64</font></div>
</div><div style><div style><font face="courier new, monospace" size="1">max memory size (kbytes, -m) unlimited</font></div></div><div style><div style><font face="courier new, monospace" size="1">open files (-n) 1048576</font></div>
</div><div style><div style><font face="courier new, monospace" size="1">pipe size (512 bytes, -p) 8</font></div></div><div style><div style><font face="courier new, monospace" size="1">POSIX message queues (bytes, -q) 819200</font></div>
</div><div style><div style><font face="courier new, monospace" size="1">real-time priority (-r) 0</font></div></div><div style><div style><font face="courier new, monospace" size="1">stack size (kbytes, -s) 8192</font></div>
</div><div style><div style><font face="courier new, monospace" size="1">cpu time (seconds, -t) unlimited</font></div></div><div style><div style><font face="courier new, monospace" size="1">max user processes (-u) 119715</font></div>
</div><div style><div style><font face="courier new, monospace" size="1">virtual memory (kbytes, -v) unlimited</font></div></div><div style><div style><font face="courier new, monospace" size="1">file locks (-x) unlimited</font></div>
</div></blockquote><div><br></div><div><div><br></div></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><div style><font face="courier new, monospace" size="1">$ cat /proc/sys/net/core/*mem_*</font></div>
</div><div style><div style><font face="courier new, monospace" size="1">184217728</font></div></div><div style><div style><font face="courier new, monospace" size="1">184217728</font></div></div><div style><div style><div style>
<font face="courier new, monospace" size="1">184217728</font></div></div></div><div style><div style><font face="courier new, monospace" size="1">184217728</font></div></div><div style><div style><font face="courier new, monospace" size="1">184217728</font></div>
</div></blockquote><div><br></div><div><br></div><div>Also for reference, here is part of my nginx.conf which may be useful for diagnosis:</div><div><br></div><div><br></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<div style><font face="courier new, monospace" size="1">worker_processes 7;</font></div><div style><font face="courier new, monospace" size="1">worker_rlimit_nofile 500000;</font></div><div style><font face="courier new, monospace" size="1"><br>
</font></div><div style><font face="courier new, monospace" size="1">events {</font></div><div style><font face="courier new, monospace" size="1"> use epoll;</font></div><div style><font face="courier new, monospace" size="1"> worker_connections 500000;</font></div>
<div style><font face="courier new, monospace" size="1"> multi_accept on;</font></div><div style><font face="courier new, monospace" size="1">}</font></div><div style><font face="courier new, monospace" size="1"><br></font></div>
<div style><font face="courier new, monospace" size="1">http {</font></div><div style><font face="courier new, monospace" size="1"> log_format main ...</font></div><div style><font face="courier new, monospace" size="1"><br>
</font></div><div style><font face="courier new, monospace" size="1"> access_log ...</font></div><div style><font face="courier new, monospace" size="1"><br></font></div><div style><font face="courier new, monospace" size="1"> ## -----------------------------------------------------------------------</font></div>
<div style><font face="courier new, monospace" size="1"> ## TCP Tuning</font></div><div style><font face="courier new, monospace" size="1"> ## -----------------------------------------------------------------------</font></div>
<div style><font face="courier new, monospace" size="1"> sendfile off;</font></div><div style><font face="courier new, monospace" size="1"> tcp_nopush off;</font></div><div style><font face="courier new, monospace" size="1"> tcp_nodelay on;</font></div>
<div style><font face="courier new, monospace" size="1"><br></font></div><div style><font face="courier new, monospace" size="1"> ## -----------------------------------------------------------------------</font></div><div style>
<font face="courier new, monospace" size="1"> ## Max Data Size</font></div><div style><font face="courier new, monospace" size="1"> ## -----------------------------------------------------------------------</font></div>
<div style><font face="courier new, monospace" size="1"> client_max_body_size 1k;</font></div><div style><font face="courier new, monospace" size="1"> client_body_buffer_size 1k;</font></div><div style>
<font face="courier new, monospace" size="1"> client_header_buffer_size 32k;</font></div><div style><font face="courier new, monospace" size="1"> large_client_header_buffers 200 32k;</font></div><div style><font face="courier new, monospace" size="1"><br>
</font></div><div style><font face="courier new, monospace" size="1"> ## -----------------------------------------------------------------------</font></div><div style><font face="courier new, monospace" size="1"> ## GZIP</font></div>
<div style><font face="courier new, monospace" size="1"> ## -----------------------------------------------------------------------</font></div><div style><font face="courier new, monospace" size="1"> gzip on;</font></div>
<div style><font face="courier new, monospace" size="1"> gzip_min_length 1000;</font></div><div style><font face="courier new, monospace" size="1"> gzip_disable msie6;</font></div><div style><font face="courier new, monospace" size="1"> gzip_proxied any;</font></div>
<div style><font face="courier new, monospace" size="1"> gzip_buffers 100 64k;</font></div><div style><font face="courier new, monospace" size="1"> gzip_types text/javascript;</font></div>
<div style><font face="courier new, monospace" size="1"><br></font></div><div style><font face="courier new, monospace" size="1"> ## -----------------------------------------------------------------------</font></div><div style>
<font face="courier new, monospace" size="1"> ## Proxy Load Distribution</font></div><div style><font face="courier new, monospace" size="1"> ## -----------------------------------------------------------------------</font></div>
<div style><font face="courier new, monospace" size="1"> proxy_redirect off;</font></div><div style><font face="courier new, monospace" size="1"> proxy_connect_timeout 5;</font></div><div style><font face="courier new, monospace" size="1"> proxy_send_timeout 5;</font></div>
<div style><font face="courier new, monospace" size="1"> proxy_read_timeout 8;</font></div><div style><font face="courier new, monospace" size="1"> proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;</font></div>
<div style><font face="courier new, monospace" size="1"> proxy_buffering off;</font></div><div style><font face="courier new, monospace" size="1"><br></font></div><div style><font face="courier new, monospace" size="1"> ## -----------------------------------------------------------------------</font></div>
<div style><font face="courier new, monospace" size="1"> ## Hide 'Server: nginx' Server Header</font></div><div style><font face="courier new, monospace" size="1"> ## -----------------------------------------------------------------------</font></div>
<div style><font face="courier new, monospace" size="1"> server_tokens off;</font></div><div style><font face="courier new, monospace" size="1"> proxy_pass_header Server;</font></div><div style><font face="courier new, monospace" size="1"><br>
</font></div><div style><font face="courier new, monospace" size="1"> upstream ...</font></div><div style><font face="courier new, monospace" size="1"><br></font></div><div style><font face="courier new, monospace" size="1"> server ...</font></div>
<div style><font face="courier new, monospace" size="1">}</font></div></blockquote></div>