Dropped connections on our Nginx
Payam Chychi
pchychi at gmail.com
Wed Jun 20 04:54:59 UTC 2012
whats ur fs ulimit set to?
Sent from my iPhone
On 2012-06-19, at 9:52 PM, Mauro Stettler <mauro.stettler at gmail.com> wrote:
> Hi list
>
> I have a problem with dropped connections on an Nginx cluster that has
> up to 100k requests per minute per Nginx instance. It seems that in
> around 1 of 10`000 requests that are sent to our Nginx, the TCP
> connection just gets reset by the server. At first I was guessing that
> some values in the /etc/sysctl.conf are maybe causing this problem,
> because we have modified multiple TCP related values there. But after
> resetting all of them to the default, the connection resets still kept
> happening.
>
> I am guessing the problem must be related to Nginx and not to a kernel
> setting because in our traffic only around 25% of all requests are
> POSTs and the rest are GETs, but more than 90% of the requests where
> the problem appears are POSTs. I don't think that the kernel can be
> aware if a request is POST or GET.
>
> The problem happens on many different URLs, mostly ones where we POST
> to, so it does not seem to be related to any rewrite rules.
>
> I have tcpdumped the problem and I can see that the request was sent
> correctly by the client. But after the request was received by the
> Nginx, it only sends back a packet with the ACK and FIN flags set. So
> the connection gets killed and most of the browsers display some empty
> pages or "zero sized reply" errors. The fact that the FIN is sent by
> the server makes me assume that the problem cannot be related to
> network hardware. Also we have this problem on all Nginx instances
> inside that cluster, so I don't think its related to broken networking
> hardware.
>
> When the problem happens, i see statements like this one in the access
> log. As you can see the Nginx says HTTP status code and length both
> are 0:
> <ip> - - [20/Jun/2012:04:13:23 +0200] "POST
> /userProfile/rateResult?userId=<id>&_csrf_token=7e23ef60c67800c4765d32b0536fc536&rate=5
> HTTP/1.1" 0 0 "<referer>" "Mozilla/5.0 (X11; U; Linux x86_64; en-US;
> rv:1.9.1.6) Gecko/20091216 Mandriva Linux/1.9.1.6-0.1mdv2010.0
> (2010.0) Firefox/3.5.6"
>
> What i also find very interesting is that the problem can happen at
> any time, so it does not seem to be related to the load or number of
> requests on the Nginx. In the morning hours we have less than 5% of
> the traffic of the evening hours, and still I sometimes see this
> problem appearing in the morning.
>
> My Nginx config is very long, so its too long to post it here. So I
> only post the parts which i think might be important, without all the
> rewrite rules:
>
> user wwwrun www;
> worker_processes 64;
> worker_rlimit_nofile 524288;
>
> events {
> worker_connections 32768;
> use epoll;
> multi_accept on;
> }
>
> http {
> sendfile on;
> tcp_nopush on;
> keepalive_requests 0;
> recursive_error_pages on;
> large_client_header_buffers 4 16k;
>
> What I also found via tcpdump is that on the requests where this
> problem appears, the Nginx receives the incoming request and then
> sends the correct request to the FastCGI backend and also receives the
> correct answer from the backend, but before the answer from the
> backend comes back (less than 300ms), it already resets the client's
> connection.
>
> Just in case this matters anyhow, this is my sysctl.conf:
>
> net.ipv4.icmp_echo_ignore_broadcasts = 1
> net.ipv4.conf.all.rp_filter = 1
> fs.inotify.max_user_watches = 65536
> net.ipv4.conf.default.promote_secondaries = 1
> net.ipv4.conf.all.promote_secondaries = 1
> net.ipv4.ip_forward = 0
> net.ipv4.conf.lo.arp_ignore = 1
> net.ipv4.conf.lo.arp_announce = 2
> net.ipv4.conf.all.arp_ignore = 1
> net.ipv4.conf.all.arp_announce = 2
> net.netfilter.nf_conntrack_max = 262144
> net.nf_conntrack_max = 262144
> net.ipv4.tcp_max_syn_backlog = 30000
> net.ipv4.tcp_max_tw_buckets = 2000000
> net.core.netdev_max_backlog = 50000
> net.ipv4.tcp_tw_reuse = 0
> net.ipv4.tcp_tw_recycle = 1
> net.ipv4.tcp_fin_timeout = 3
> net.ipv4.tcp_keepalive_time = 120
> net.core.wmem_max = 8388608
> net.core.rmem_max = 8388608
> net.ipv4.tcp_rmem = 4096 87380 8388608
> net.ipv4.tcp_wmem = 4096 87380 8388608
> net.core.somaxconn = 1024
> kernel.pid_max = 65536
> net.ipv4.conf.all.log_martians = 0
> net.ipv4.conf.default.log_martians = 0
> net.ipv4.conf.lo.log_martians = 0
> net.ipv4.conf.eth0.log_martians = 0
> net.ipv4.conf.eth1.log_martians = 0
>
> Our operating system is SuSE Linux Enterprise 11.0. The Nginx
> configure params are the following:
>
> nginx version: nginx/1.2.1
> built by gcc 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux)
> configure arguments: --prefix=/usr/local/nginx-1.2.1
> --error-log-path=/var/log/nginx/error.log
> --http-log-path=/var/log/nginx/access.log
> --with-http_stub_status_module --without-http_autoindex_module
> --without-http_geo_module --without-http_map_module
> --without-http_referer_module --without-http_limit_conn_module
> --without-http_empty_gif_module --without-mail_pop3_module
> --without-mail_imap_module --without-mail_smtp_module
> --with-http_geoip_module --with-pcre=/usr/local/src/nginx/pcre-8.30
> --add-module=3rd/agentzh-nginx-eval-module-4eb2a02
> --add-module=3rd/ngx_http_log_request_speed
> --add-module=3rd/replay-ngx_http_generate_secure_download_links-4c1a46a
> --add-module=3rd/agentzh-memc-nginx-module-8befc56
> --add-module=3rd/agentzh-echo-nginx-module-080c0a1
> --add-module=3rd/replay-ngx_http_php_memcache_standard_balancer-4f7dcba
> --add-module=3rd/masterzen-nginx-upload-progress-module-a788dea
> --add-module=3rd/replay-ngx_http_php_session-30f69b3
> --add-module=3rd/simpl-ngx_devel_kit-24202b4
> --add-module=3rd/chaoslawful-lua-nginx-module-c5be5ff
> --add-module=3rd/replay-ngx_http_lower_upper_case-44958e0
> --add-module=3rd/gnosek-nginx-upstream-fair-a18b409
>
> In the dmesg I cannot see anything suspicious, there are no segfaults
> or related networking messages.
>
> I have already tried setting the Nginx error log to some high log
> level, but I didn't see anything related to my problem, even at times
> when I saw that the problem is happening.
>
> Now I don't really know what else to check anymore... I would be
> really glad if somebody had some ideas?
>
> Thanks for help,
>
> Mauro
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
More information about the nginx
mailing list