<div dir="ltr">
<p class="">OK, I have an incredibly weird nginx connection issue.</p>
<p class="">I have a cluster of boxes that are responsible for terminating SSL requests and passing them to a local haproxy instance for further routing. I have corosync/pacemaker setup to manage the IP addresses and failover instances if there’s an issue.</p>
<p class="">This server has been running fine for a long time, but we recently had to reboot because of the GHOST stuff. Before we did that, we did an apt-get upgrade to get to the latest Debian Wheezy packages, including a new nginx (1.6.2), openssl, kernel, and just about </p>
<p class="">After that happened, we started seeing connection issues to the nginx that does SSL termination. We When it was happening, about 50% of our requests were timing out (iOS/Android clients). I was testing manually using curl when it was happening, and we were seeing huge fluctuations in the time it takes to connect. I saw a lot of connections just timing out completely, in combination with connections take 1s, 3s, 15s, 30s, etc…</p>
<p class="">When this issue was happening to nginx, haproxy on the same box was unaffected, tested by curling every second from a box close to it, logging the results and verifying results. So, it seemed to just be SSL with nginx.</p>
<p class="">Now that our peak load is down, it’s not as big an issue, but we are still seeing connection issues when I curl, just more like 1-3s typically, just not as many. Since we’ve had some time to experiment, I’ve gathered more information that makes no sense to me.</p>
<p class="">Almost all the traffic was setup to go to the address managed by corosync. When I setup my curl tests to run every second, I see the timeouts. SO, I tried something. I bound the main ip address of the NIC to nginx, reloaded, and redid the same test, but pointed the curl to go to the main ip address. As soon as I did that, my curl tests never saw a single issue and the connect phase never takes more than 2ms and no timeouts.</p>
<p class="">So, I started thinking it was the corosync IP, so I sent all our traffic to go to the main nic ip address that just tested fine, and once the normal traffic levels switched over to main nic, I started seeing curl timeouts now that it had traffic. So, I then started curling the IP from corosync that used to be primary, and now IT has no connection issues.</p>
<p class="">So, I have connection issues to nginx but only on the IP address that takes the traffic. nginx on a different IP on the same NIC is fine. haproxy on the same NIC as fine.</p>
<p class="">What the heck? Struggling to think of anything I could tweak. This doesn’t make sense, but I have triple checked my info, and it’s legit.</p>
<p class=""><br></p>
<p class=""><br></p>
<p class=""><br></p></div>