<div dir="ltr"><div>Hi folks,<br><br>We are using Nginx for SSL termination, and then it proxies to an ATS or Haproxy server depending on our environment. <br><br>We're running into a problem where every now and then, Nginx closes a
connection due to a timeout. When investigating, it looks like the
connections that are being timed-out are not being forwarded to the
backend service. The scenario when we were able to best reproduce this is one where one of our Java client was running about 100 REST requests that were fairly similar. I've attached files that contain both the tcpdump from the client side as well as the debug log on the nginx side.<br>
<br></div><div>I tried comparing a successful and unsuccessful request next to each other. From the client side, it looks like the messages back and forth look very consistent. On the nginx side, the first difference seems to happen when reading in the Http Request. The requests that fail, all seem to do a partial read:<br>
<br>==SNIPPET FROM FAILING REQUEST==<br><br>2013/11/20 01:58:05 [debug] 52611#0: *621 http wait request handler<br>2013/11/20 01:58:05 [debug] 52611#0: *621 malloc: 0000000000F1BE20:1024<br>2013/11/20 01:58:05 [debug] 52611#0: *621 SSL_read: 335<br>
2013/11/20 01:58:05 [debug] 52611#0: *621 SSL_read: 1<br>2013/11/20 01:58:05 [debug] 52611#0: *621 SSL_read: -1<br>2013/11/20 01:58:05 [debug] 52611#0: *621 SSL_get_error: 2<br>2013/11/20 01:58:05 [debug] 52611#0: *621 reusable connection: 0<br>
2013/11/20 01:58:05 [debug] 52611#0: *621 posix_memalign: 0000000000EFB600:4096 @16<br>2013/11/20 01:58:05 [debug] 52611#0: *621 http process request line<br><br><br></div><div>==SNIPPET FROM SUCCEEDING REQUEST==<br></div>
<div>2013/11/20 01:58:04 [debug] 52611#0: *619 http wait request handler<br>2013/11/20 01:58:04 [debug] 52611#0: *619 malloc: 0000000000F1BE20:1024<br>2013/11/20 01:58:04 [debug] 52611#0: *619 SSL_read: 335<br>2013/11/20 01:58:04 [debug] 52611#0: *619 SSL_read: 1<br>
2013/11/20 01:58:04 [debug] 52611#0: *619 SSL_read: 167<br>2013/11/20 01:58:04 [debug] 52611#0: *619 SSL_read: 1<br>2013/11/20 01:58:04 [debug] 52611#0: *619 SSL_read: 2<br>2013/11/20 01:58:04 [debug] 52611#0: *619 SSL_read: 1<br>
2013/11/20 01:58:04 [debug] 52611#0: *619 SSL_read: 1<br>2013/11/20 01:58:04 [debug] 52611#0: *619 SSL_read: -1<br>2013/11/20 01:58:04 [debug] 52611#0: *619 SSL_get_error: 2<br>2013/11/20 01:58:04 [debug] 52611#0: *619 reusable connection: 0<br>
2013/11/20 01:58:04 [debug] 52611#0: *619 posix_memalign: 0000000000EFB600:4096 @16<br>2013/11/20 01:58:04 [debug] 52611#0: *619 http process request line<br><br></div><div>This difference seems to be consistent with errors. Later on, the request that ends up failing attempts to load the request body, but only gets about 16 bytes, and looks like it keeps waiting for the rest of the data. However, looking at a tcpdump from the client, it looks like all the data is sent up and ack-ed. Then the client sees nothing for a minute until the connection is closed by the server.<br>
<br></div><div>==TCP DUMP FROM FAILING REQUEST==<br>...<br>17:58:05.259451 IP x.x.x.145.53167 > y.y.y.209.https: Flags [.], ack 2996, win 8023, options [nop,nop,TS val 145724966 ecr 155421032], length 0<br>17:58:05.261592 IP x.x.x.145.53167 > y.y.y.209.https: Flags [P.], seq 221:296, ack 2996, win 8192, options [nop,nop,TS val 145724968 ecr 155421032], length 75<br>
17:58:05.262989 IP x.x.x.145.53167 > y.y.y.209.https: Flags [P.], seq 296:302, ack 2996, win 8192, options [nop,nop,TS val 145724969 ecr 155421032], length 6<br>17:58:05.263196 IP x.x.x.145.53167 > y.y.y.209.https: Flags [P.], seq 302:355, ack 2996, win 8192, options [nop,nop,TS val 145724969 ecr 155421032], length 53<br>
17:58:05.295347 IP y.y.y.209.https > x.x.x.145.53167: Flags [.], ack 355, win 122, options [nop,nop,TS val 155421041 ecr 145724968], length 0<br>17:58:05.295387 IP y.y.y.209.https > x.x.x.145.53167: Flags [P.], seq 2996:3055, ack 355, win 122, options [nop,nop,TS val 155421042 ecr 145724968], length 59<br>
17:58:05.295481 IP x.x.x.145.53167 > y.y.y.209.https: Flags [.], ack 3055, win 8188, options [nop,nop,TS val 145725000 ecr 155421042], length 0<br>17:58:05.300058 IP x.x.x.145.53167 > y.y.y.209.https: Flags [P.], seq 355:728, ack 3055, win 8192, options [nop,nop,TS val 145725004 ecr 155421042], length 373<br>
17:58:05.300253 IP x.x.x.145.53167 > y.y.y.209.https: Flags [P.], seq 728:765, ack 3055, win 8192, options [nop,nop,TS val 145725004 ecr 155421042], length 37<br>17:58:05.300254 IP x.x.x.145.53167 > y.y.y.209.https: Flags [P.], seq 765:962, ack 3055, win 8192, options [nop,nop,TS val 145725004 ecr 155421042], length 197<br>
17:58:05.300282 IP x.x.x.145.53167 > y.y.y.209.https: Flags [P.], seq 962:999, ack 3055, win 8192, options [nop,nop,TS val 145725004 ecr 155421042], length 37<br>17:58:05.300307 IP x.x.x.145.53167 > y.y.y.209.https: Flags [P.], seq 999:1036, ack 3055, win 8192, options [nop,nop,TS val 145725004 ecr 155421042], length 37<br>
17:58:05.300342 IP x.x.x.145.53167 > y.y.y.209.https: Flags [P.], seq 1036:1073, ack 3055, win 8192, options [nop,nop,TS val 145725004 ecr 155421042], length 37<br>17:58:05.300365 IP x.x.x.145.53167 > y.y.y.209.https: Flags [P.], seq 1073:1110, ack 3055, win 8192, options [nop,nop,TS val 145725004 ecr 155421042], length 37<br>
17:58:05.332040 IP y.y.y.209.https > x.x.x.145.53167: Flags [.], ack 765, win 130, options [nop,nop,TS val 155421051 ecr 145725004], length 0<br>17:58:05.332247 IP y.y.y.209.https > x.x.x.145.53167: Flags [.], ack 1110, win 139, options [nop,nop,TS val 155421051 ecr 145725004], length 0<br>
</div><div><b>17:59:05.338429 IP y.y.y.209.https > x.x.x.145.53167: Flags [F.], seq 3055, ack 1110, win 139, options [nop,nop,TS val 155436119 ecr 145725004], length 0</b><br>17:59:05.338581 IP x.x.x.145.53167 > y.y.y.209.https: Flags [.], ack 3056, win 8192, options [nop,nop,TS val 145784751 ecr 155436119], length 0<br>
17:59:05.338932 IP x.x.x.145.53167 > y.y.y.209.https: Flags [P.], seq 1110:1147, ack 3056, win 8192, options [nop,nop,TS val 145784751 ecr 155436119], length 37<br>17:59:05.338933 IP x.x.x.145.53167 > y.y.y.209.https: Flags [F.], seq 1147, ack 3056, win 8192, options [nop,nop,TS val 145784751 ecr 155436119], length 0<br>
17:59:05.370400 IP y.y.y.209.https > x.x.x.145.53167: Flags [R], seq 325468762, win 0, length 0<br><br></div><div>==TCP DUMP FROM SUCCESSFUL REQUEST==<br>...<br>17:58:04.481722 IP x.x.x.145.53166 > y.y.y.209.https: Flags [.], ack 2996, win 8023, options [nop,nop,TS val 145724196 ecr 155420837], length 0<br>
17:58:04.484108 IP x.x.x.145.53166 > y.y.y.209.https: Flags [P.], seq 221:296, ack 2996, win 8192, options [nop,nop,TS val 145724198 ecr 155420837], length 75<br>17:58:04.485569 IP x.x.x.145.53166 > y.y.y.209.https: Flags [P.], seq 296:302, ack 2996, win 8192, options [nop,nop,TS val 145724199 ecr 155420837], length 6<br>
17:58:04.485767 IP x.x.x.145.53166 > y.y.y.209.https: Flags [P.], seq 302:355, ack 2996, win 8192, options [nop,nop,TS val 145724199 ecr 155420837], length 53<br>17:58:04.531685 IP y.y.y.209.https > x.x.x.145.53166: Flags [.], ack 355, win 122, options [nop,nop,TS val 155420850 ecr 145724198], length 0<br>
17:58:04.531689 IP y.y.y.209.https > x.x.x.145.53166: Flags [P.], seq 2996:3055, ack 355, win 122, options [nop,nop,TS val 155420850 ecr 145724198], length 59<br>17:58:04.531827 IP x.x.x.145.53166 > y.y.y.209.https: Flags [.], ack 3055, win 8188, options [nop,nop,TS val 145724244 ecr 155420850], length 0<br>
17:58:04.532709 IP x.x.x.145.53166 > y.y.y.209.https: Flags [P.], seq 355:728, ack 3055, win 8192, options [nop,nop,TS val 145724244 ecr 155420850], length 373<br>17:58:04.532906 IP x.x.x.145.53166 > y.y.y.209.https: Flags [P.], seq 728:765, ack 3055, win 8192, options [nop,nop,TS val 145724245 ecr 155420850], length 37<br>
17:58:04.532954 IP x.x.x.145.53166 > y.y.y.209.https: Flags [P.], seq 765:962, ack 3055, win 8192, options [nop,nop,TS val 145724245 ecr 155420850], length 197<br>17:58:04.532983 IP x.x.x.145.53166 > y.y.y.209.https: Flags [P.], seq 962:999, ack 3055, win 8192, options [nop,nop,TS val 145724245 ecr 155420850], length 37<br>
17:58:04.533012 IP x.x.x.145.53166 > y.y.y.209.https: Flags [P.], seq 999:1036, ack 3055, win 8192, options [nop,nop,TS val 145724245 ecr 155420850], length 37<br>17:58:04.533045 IP x.x.x.145.53166 > y.y.y.209.https: Flags [P.], seq 1036:1073, ack 3055, win 8192, options [nop,nop,TS val 145724245 ecr 155420850], length 37<br>
17:58:04.533143 IP x.x.x.145.53166 > y.y.y.209.https: Flags [P.], seq 1073:1110, ack 3055, win 8192, options [nop,nop,TS val 145724245 ecr 155420850], length 37<br>17:58:04.565176 IP y.y.y.209.https > x.x.x.145.53166: Flags [.], ack 962, win 139, options [nop,nop,TS val 155420858 ecr 145724244], length 0<br>
17:58:04.565184 IP y.y.y.209.https > x.x.x.145.53166: Flags [.], ack 1110, win 139, options [nop,nop,TS val 155420858 ecr 145724245], length 0<br><b>17:58:05.184331 IP y.y.y.209.https > x.x.x.145.53166: Flags [P.], seq 3055:3540, ack 1110, win 139, options [nop,nop,TS val 155421011 ecr 145724245], length 485</b><br>
17:58:05.184442 IP x.x.x.145.53166 > y.y.y.209.https: Flags [.], ack 3540, win 8161, options [nop,nop,TS val 145724894 ecr 155421011], length 0<br>17:58:05.184592 IP y.y.y.209.https > x.x.x.145.53166: Flags [F.], seq 3540, ack 1110, win 139, options [nop,nop,TS val 155421011 ecr 145724245], length 0<br>
17:58:05.184658 IP x.x.x.145.53166 > y.y.y.209.https: Flags [.], ack 3541, win 8192, options [nop,nop,TS val 145724894 ecr 155421011], length 0<br>17:58:05.184863 IP x.x.x.145.53166 > y.y.y.209.https: Flags [P.], seq 1110:1147, ack 3541, win 8192, options [nop,nop,TS val 145724894 ecr 155421011], length 37<br>
17:58:05.184863 IP x.x.x.145.53166 > y.y.y.209.https: Flags [F.], seq 1147, ack 3541, win 8192, options [nop,nop,TS val 145724894 ecr 155421011], length 0<br><br></div><div><br>When I look at the TCP dumps, it looks like the client sends up all the data, and it looks like the server receives and acks it. At this point, I'm sorta stuck. Does anyone have any insight here? Is there a know bug that has been fixed that we may be missing? I've attached files that have more complete dumps of the requests.<br>
<br></div><div>We are running Ubuntu 12.04.3 LTS, with:<br> nginx version: nginx/1.4.1<br> openssl version: OpenSSL 1.0.1 14 Mar 2012<br><br></div><div> Thanks,<br></div><div> --Will<br></div><div><br><br><br>
</div><div><br></div><div><br></div><br></div>