Server very delayed in sending SYN/ACK

Krishna Kumar (Engineering) krishna.ku at flipkart.com
Sun Sep 4 02:49:51 UTC 2016


Hi Will,

> * In the packet capture from the server, I see the SYN packet come in,
then 3 more retransmits of that same syn come in before the server sent
back the SYN/ACK. To me this indicates the issue in kernel or nginx side.

Many times clients sends multiple SYN's. You can run wireshark to check
the time stamps. If the packets are very close (in milliseconds), that is
normal,
else you have a problem on the server.

nginx does not come into the picture during TCP handshake, it's job is
done when nginx indicates that this socket is ready to accept connection
using the listen() system call. Once the final ack is done, the connection
is
ready and if an accept() is called, it will succeed (as in does-not-block).
However, the client would get success on connect() at the time the TCP
handshake finished, not when the application finished the accept() call.

Maybe attaching tcpdump will be useful for someone to take a look at what
is wrong. Are the initial packets being dropped at the kernel due to bad
checksums? Do you have any IPTable rules that might drop syn's or rate limit
it? Do you see retransmissions (netstat -s)? Maybe you can run netstat -s
before and after to see which counters increase and derive some clues
from that?


On Sun, Sep 4, 2016 at 5:36 AM, Will Platnick <wplatnick at gmail.com> wrote:

> Hello,
> I have run into a very interesting issue.  I am replacing a set of nginx
> reverse proxy servers with a new set running an updated OS/nginx. These
> nginx servers front a popular API that's mostly used by mobile apps, but
> also a website that's hosted on a nearby subnet. I put the new servers into
> service last night, and this morning as traffic picked up (only a couple
> thousand requests per second), I got alerts from my DNS provider that
> requests to the new server were starting to timeout in the Connect phase.
> I hopped into New Relic, and I could see tons of requests from my website
> to the nginx reverse proxy timing out after it hit our limit of 10s. I did
> some curl requests with timing information, and I could see long times only
> in the time_connect level, confirming the issue was only in the connection
> phase. I hopped on the new nginx server and started a packet capture
> filtered to a machine on a nearby subnet, did the curl from there, got it
> taking a 9+ seconds in the connect phase, stopped the packet capture, and
> moved the traffic over to my old setup. No issues over there.
>
> Here's everything I know/think is relevant:
>
> * In the packet capture from the server, I see the SYN packet come in,
> then 3 more retransmits of that same syn come in before the server sent
> back the SYN/ACK. To me this indicates the issue in kernel or nginx side.
>
> * There's absolutely no slowdown in the backends as measured from the same
> nginx server.
>
> * There's nothing in the nginx error log
>
> * There's nothing from the kernel in dmesg when this is happening
>
> * NIC duplex is fine, no dropped queues from ethtool -S (but, again, it
> doesn't seem like a networking issue, we got the SYNs just fine, we just
> didn't send the syn/ack)
>
> * I tried to artificially load test afterwords using ab and loader.io,
> doing 3x as many requests, but couldn't replicate the issue. I'm not sure
> if it's some weird issue due to misbehaving mobile clients and SSL filling
> up some sort of queue, but whatever it is, I can't replicate the issue on
> demand.
>
> * Load on the box was fine (<4) and no crazy I/O.
>
> * Keepalives were turned on
>
> * Some relevant sysctl values:
>
> cat /proc/sys/net/core/somaxconn (backlog is set to the same in the nginx
> config)
> 16384
>
> cat /proc/sys/net/core/netdev_max_backlog
> 15000
>
> cat /proc/sys/net/ipv4/tcp_max_syn_backlog
> 262144
>
> NGINX: 1.11.3
> OS: Ubuntu 16.04.1 x64
> Kernel: 4.4.0-36-generic
>
> It seems to me the issue is at the kernel/app level, but I can't think of
> where to go from here.
>
> If anybody has any ideas for me try, or if I've forgotten to mention
> something relevant, please let me know.
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20160904/ca90d6ea/attachment.html>


More information about the nginx mailing list