Dropped connections on our Nginx

Wed Jun 20 04:54:59 UTC 2012

whats ur fs ulimit set to?

Sent from my iPhone

On 2012-06-19, at 9:52 PM, Mauro Stettler <mauro.stettler at gmail.com> wrote:

> Hi list
> 
> I have a problem with dropped connections on an Nginx cluster that has
> up to 100k requests per minute per Nginx instance. It seems that in
> around 1 of 10`000 requests that are sent to our Nginx, the TCP
> connection just gets reset by the server. At first I was guessing that
> some values in the /etc/sysctl.conf are maybe causing this problem,
> because we have modified multiple TCP related values there. But after
> resetting all of them to the default, the connection resets still kept
> happening.
> 
> I am guessing the problem must be related to Nginx and not to a kernel
> setting because in our traffic only around 25% of all requests are
> POSTs and the rest are GETs, but more than 90% of the requests where
> the problem appears are POSTs. I don't think that the kernel can be
> aware if a request is POST or GET.
> 
> The problem happens on many different URLs, mostly ones where we POST
> to, so it does not seem to be related to any rewrite rules.
> 
> I have tcpdumped the problem and I can see that the request was sent
> correctly by the client. But after the request was received by the
> Nginx, it only sends back a packet with the ACK and FIN flags set. So
> the connection gets killed and most of the browsers display some empty
> pages or "zero sized reply" errors. The fact that the FIN is sent by
> the server makes me assume that the problem cannot be related to
> network hardware. Also we have this problem on all Nginx instances
> inside that cluster, so I don't think its related to broken networking
> hardware.
> 
> When the problem happens, i see statements like this one in the access
> log. As you can see the Nginx says HTTP status code and length both
> are 0:
> <ip> - - [20/Jun/2012:04:13:23 +0200] "POST
> /userProfile/rateResult?userId=<id>&_csrf_token=7e23ef60c67800c4765d32b0536fc536&rate=5
> HTTP/1.1" 0 0 "<referer>" "Mozilla/5.0 (X11; U; Linux x86_64; en-US;
> rv:1.9.1.6) Gecko/20091216 Mandriva Linux/1.9.1.6-0.1mdv2010.0
> (2010.0) Firefox/3.5.6"
> 
> What i also find very interesting is that the problem can happen at
> any time, so it does not seem to be related to the load or number of
> requests on the Nginx. In the morning hours we have less than 5% of
> the traffic of the evening hours, and still I sometimes see this
> problem appearing in the morning.
> 
> My Nginx config is very long, so its too long to post it here. So I
> only post the parts which i think might be important, without all the
> rewrite rules:
> 
> user wwwrun www;
> worker_processes 64;
> worker_rlimit_nofile 524288;
> 
> events {
>    worker_connections 32768;
>    use epoll;
>    multi_accept on;
> }
> 
> http {
>    sendfile on;
>    tcp_nopush on;
>    keepalive_requests 0;
>    recursive_error_pages on;
>    large_client_header_buffers 4 16k;
> 
> What I also found via tcpdump is that on the requests where this
> problem appears, the Nginx receives the incoming request and then
> sends the correct request to the FastCGI backend and also receives the
> correct answer from the backend, but before the answer from the
> backend comes back (less than 300ms), it already resets the client's
> connection.
> 
> Just in case this matters anyhow, this is my sysctl.conf:
> 
> net.ipv4.icmp_echo_ignore_broadcasts = 1
> net.ipv4.conf.all.rp_filter = 1
> fs.inotify.max_user_watches = 65536
> net.ipv4.conf.default.promote_secondaries = 1
> net.ipv4.conf.all.promote_secondaries = 1
> net.ipv4.ip_forward = 0
> net.ipv4.conf.lo.arp_ignore = 1
> net.ipv4.conf.lo.arp_announce = 2
> net.ipv4.conf.all.arp_ignore = 1
> net.ipv4.conf.all.arp_announce = 2
> net.netfilter.nf_conntrack_max = 262144
> net.nf_conntrack_max = 262144
> net.ipv4.tcp_max_syn_backlog = 30000
> net.ipv4.tcp_max_tw_buckets = 2000000
> net.core.netdev_max_backlog = 50000
> net.ipv4.tcp_tw_reuse = 0
> net.ipv4.tcp_tw_recycle = 1
> net.ipv4.tcp_fin_timeout = 3
> net.ipv4.tcp_keepalive_time = 120
> net.core.wmem_max = 8388608
> net.core.rmem_max = 8388608
> net.ipv4.tcp_rmem = 4096 87380 8388608
> net.ipv4.tcp_wmem = 4096 87380 8388608
> net.core.somaxconn = 1024
> kernel.pid_max = 65536
> net.ipv4.conf.all.log_martians = 0
> net.ipv4.conf.default.log_martians = 0
> net.ipv4.conf.lo.log_martians = 0
> net.ipv4.conf.eth0.log_martians = 0
> net.ipv4.conf.eth1.log_martians = 0
> 
> Our operating system is SuSE Linux Enterprise 11.0. The Nginx
> configure params are the following:
> 
> nginx version: nginx/1.2.1
> built by gcc 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux)
> configure arguments: --prefix=/usr/local/nginx-1.2.1
> --error-log-path=/var/log/nginx/error.log
> --http-log-path=/var/log/nginx/access.log
> --with-http_stub_status_module --without-http_autoindex_module
> --without-http_geo_module --without-http_map_module
> --without-http_referer_module --without-http_limit_conn_module
> --without-http_empty_gif_module --without-mail_pop3_module
> --without-mail_imap_module --without-mail_smtp_module
> --with-http_geoip_module --with-pcre=/usr/local/src/nginx/pcre-8.30
> --add-module=3rd/agentzh-nginx-eval-module-4eb2a02
> --add-module=3rd/ngx_http_log_request_speed
> --add-module=3rd/replay-ngx_http_generate_secure_download_links-4c1a46a
> --add-module=3rd/agentzh-memc-nginx-module-8befc56
> --add-module=3rd/agentzh-echo-nginx-module-080c0a1
> --add-module=3rd/replay-ngx_http_php_memcache_standard_balancer-4f7dcba
> --add-module=3rd/masterzen-nginx-upload-progress-module-a788dea
> --add-module=3rd/replay-ngx_http_php_session-30f69b3
> --add-module=3rd/simpl-ngx_devel_kit-24202b4
> --add-module=3rd/chaoslawful-lua-nginx-module-c5be5ff
> --add-module=3rd/replay-ngx_http_lower_upper_case-44958e0
> --add-module=3rd/gnosek-nginx-upstream-fair-a18b409
> 
> In the dmesg I cannot see anything suspicious, there are no segfaults
> or related networking messages.
> 
> I have already tried setting the Nginx error log to some high log
> level, but I didn't see anything related to my problem, even at times
> when I saw that the problem is happening.
> 
> Now I don't really know what else to check anymore... I would be
> really glad if somebody had some ideas?
> 
> Thanks for help,
> 
> Mauro
> 
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx