Weird nginx SSL connection issues

Will Platnick wplatnick at
Sun Feb 1 23:47:15 UTC 2015

OK, I have an incredibly weird nginx connection issue.

I have a cluster of boxes that are responsible for terminating SSL requests
and passing them to a local haproxy instance for further routing. I have
corosync/pacemaker setup to manage the IP addresses and failover instances
if there’s an issue.

This server has been running fine for a long time, but we recently had to
reboot because of the GHOST stuff. Before we did that, we did an apt-get
upgrade to get to the latest Debian Wheezy packages, including a new nginx
(1.6.2), openssl, kernel, and just about

After that happened, we started seeing connection issues to the nginx that
does SSL termination. We When it was happening, about 50% of our requests
were timing out (iOS/Android clients). I was testing manually using curl
when it was happening, and we were seeing huge fluctuations in the time it
takes to connect. I saw a lot of connections just timing out completely, in
combination with connections take 1s, 3s, 15s, 30s, etc…

When this issue was happening to nginx, haproxy on the same box was
unaffected, tested by curling every second from a box close to it, logging
the results and verifying results. So, it seemed to just be SSL with nginx.

Now that our peak load is down, it’s not as big an issue, but we are still
seeing connection issues when I curl, just more like 1-3s typically, just
not as many. Since we’ve had some time to experiment, I’ve gathered more
information that makes no sense to me.

Almost all the traffic was setup to go to the address managed by corosync.
When I setup my curl tests to run every second, I see the timeouts. SO, I
tried something. I bound the main ip address of the NIC to nginx, reloaded,
and redid the same test, but pointed the curl to go to the main ip address.
As soon as I did that, my curl tests never saw a single issue and the
connect phase never takes more than 2ms and no timeouts.

So, I started thinking it was the corosync IP, so I sent all our traffic to
go to the main nic ip address that just tested fine, and once the normal
traffic levels switched over to main nic, I started seeing curl timeouts
now that it had traffic. So, I then started curling the IP from corosync
that used to be primary, and now IT has no connection issues.

So, I have connection issues to nginx but only on the IP address that takes
the traffic. nginx on a different IP on the same NIC is fine. haproxy on
the same NIC as fine.

What the heck? Struggling to think of anything I could tweak. This doesn’t
make sense, but I have triple checked my info, and it’s legit.
