Dropped connections on our Nginx

Mauro Stettler mauro.stettler at gmail.com
Wed Jun 20 05:08:31 UTC 2012


its set to a high value:

rsid-a-57:/etc/nginx # ulimit -n
524288

i think if the problem would be the fs ulimit, i should see some error
related to that no more files can be opened. but there is no such
error.



On Wed, Jun 20, 2012 at 12:54 PM, Payam Chychi <pchychi at gmail.com> wrote:
> whats ur fs ulimit set to?
>
>
> Sent from my iPhone
>
> On 2012-06-19, at 9:52 PM, Mauro Stettler <mauro.stettler at gmail.com> wrote:
>
>> Hi list
>>
>> I have a problem with dropped connections on an Nginx cluster that has
>> up to 100k requests per minute per Nginx instance. It seems that in
>> around 1 of 10`000 requests that are sent to our Nginx, the TCP
>> connection just gets reset by the server. At first I was guessing that
>> some values in the /etc/sysctl.conf are maybe causing this problem,
>> because we have modified multiple TCP related values there. But after
>> resetting all of them to the default, the connection resets still kept
>> happening.
>>
>> I am guessing the problem must be related to Nginx and not to a kernel
>> setting because in our traffic only around 25% of all requests are
>> POSTs and the rest are GETs, but more than 90% of the requests where
>> the problem appears are POSTs. I don't think that the kernel can be
>> aware if a request is POST or GET.
>>
>> The problem happens on many different URLs, mostly ones where we POST
>> to, so it does not seem to be related to any rewrite rules.
>>
>> I have tcpdumped the problem and I can see that the request was sent
>> correctly by the client. But after the request was received by the
>> Nginx, it only sends back a packet with the ACK and FIN flags set. So
>> the connection gets killed and most of the browsers display some empty
>> pages or "zero sized reply" errors. The fact that the FIN is sent by
>> the server makes me assume that the problem cannot be related to
>> network hardware. Also we have this problem on all Nginx instances
>> inside that cluster, so I don't think its related to broken networking
>> hardware.
>>
>> When the problem happens, i see statements like this one in the access
>> log. As you can see the Nginx says HTTP status code and length both
>> are 0:
>> <ip> - - [20/Jun/2012:04:13:23 +0200] "POST
>> /userProfile/rateResult?userId=<id>&_csrf_token=7e23ef60c67800c4765d32b0536fc536&rate=5
>> HTTP/1.1" 0 0 "<referer>" "Mozilla/5.0 (X11; U; Linux x86_64; en-US;
>> rv:1.9.1.6) Gecko/20091216 Mandriva Linux/1.9.1.6-0.1mdv2010.0
>> (2010.0) Firefox/3.5.6"
>>
>> What i also find very interesting is that the problem can happen at
>> any time, so it does not seem to be related to the load or number of
>> requests on the Nginx. In the morning hours we have less than 5% of
>> the traffic of the evening hours, and still I sometimes see this
>> problem appearing in the morning.
>>
>> My Nginx config is very long, so its too long to post it here. So I
>> only post the parts which i think might be important, without all the
>> rewrite rules:
>>
>> user wwwrun www;
>> worker_processes 64;
>> worker_rlimit_nofile 524288;
>>
>> events {
>>    worker_connections 32768;
>>    use epoll;
>>    multi_accept on;
>> }
>>
>> http {
>>    sendfile on;
>>    tcp_nopush on;
>>    keepalive_requests 0;
>>    recursive_error_pages on;
>>    large_client_header_buffers 4 16k;
>>
>> What I also found via tcpdump is that on the requests where this
>> problem appears, the Nginx receives the incoming request and then
>> sends the correct request to the FastCGI backend and also receives the
>> correct answer from the backend, but before the answer from the
>> backend comes back (less than 300ms), it already resets the client's
>> connection.
>>
>> Just in case this matters anyhow, this is my sysctl.conf:
>>
>> net.ipv4.icmp_echo_ignore_broadcasts = 1
>> net.ipv4.conf.all.rp_filter = 1
>> fs.inotify.max_user_watches = 65536
>> net.ipv4.conf.default.promote_secondaries = 1
>> net.ipv4.conf.all.promote_secondaries = 1
>> net.ipv4.ip_forward = 0
>> net.ipv4.conf.lo.arp_ignore = 1
>> net.ipv4.conf.lo.arp_announce = 2
>> net.ipv4.conf.all.arp_ignore = 1
>> net.ipv4.conf.all.arp_announce = 2
>> net.netfilter.nf_conntrack_max = 262144
>> net.nf_conntrack_max = 262144
>> net.ipv4.tcp_max_syn_backlog = 30000
>> net.ipv4.tcp_max_tw_buckets = 2000000
>> net.core.netdev_max_backlog = 50000
>> net.ipv4.tcp_tw_reuse = 0
>> net.ipv4.tcp_tw_recycle = 1
>> net.ipv4.tcp_fin_timeout = 3
>> net.ipv4.tcp_keepalive_time = 120
>> net.core.wmem_max = 8388608
>> net.core.rmem_max = 8388608
>> net.ipv4.tcp_rmem = 4096 87380 8388608
>> net.ipv4.tcp_wmem = 4096 87380 8388608
>> net.core.somaxconn = 1024
>> kernel.pid_max = 65536
>> net.ipv4.conf.all.log_martians = 0
>> net.ipv4.conf.default.log_martians = 0
>> net.ipv4.conf.lo.log_martians = 0
>> net.ipv4.conf.eth0.log_martians = 0
>> net.ipv4.conf.eth1.log_martians = 0
>>
>> Our operating system is SuSE Linux Enterprise 11.0. The Nginx
>> configure params are the following:
>>
>> nginx version: nginx/1.2.1
>> built by gcc 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux)
>> configure arguments: --prefix=/usr/local/nginx-1.2.1
>> --error-log-path=/var/log/nginx/error.log
>> --http-log-path=/var/log/nginx/access.log
>> --with-http_stub_status_module --without-http_autoindex_module
>> --without-http_geo_module --without-http_map_module
>> --without-http_referer_module --without-http_limit_conn_module
>> --without-http_empty_gif_module --without-mail_pop3_module
>> --without-mail_imap_module --without-mail_smtp_module
>> --with-http_geoip_module --with-pcre=/usr/local/src/nginx/pcre-8.30
>> --add-module=3rd/agentzh-nginx-eval-module-4eb2a02
>> --add-module=3rd/ngx_http_log_request_speed
>> --add-module=3rd/replay-ngx_http_generate_secure_download_links-4c1a46a
>> --add-module=3rd/agentzh-memc-nginx-module-8befc56
>> --add-module=3rd/agentzh-echo-nginx-module-080c0a1
>> --add-module=3rd/replay-ngx_http_php_memcache_standard_balancer-4f7dcba
>> --add-module=3rd/masterzen-nginx-upload-progress-module-a788dea
>> --add-module=3rd/replay-ngx_http_php_session-30f69b3
>> --add-module=3rd/simpl-ngx_devel_kit-24202b4
>> --add-module=3rd/chaoslawful-lua-nginx-module-c5be5ff
>> --add-module=3rd/replay-ngx_http_lower_upper_case-44958e0
>> --add-module=3rd/gnosek-nginx-upstream-fair-a18b409
>>
>> In the dmesg I cannot see anything suspicious, there are no segfaults
>> or related networking messages.
>>
>> I have already tried setting the Nginx error log to some high log
>> level, but I didn't see anything related to my problem, even at times
>> when I saw that the problem is happening.
>>
>> Now I don't really know what else to check anymore... I would be
>> really glad if somebody had some ideas?
>>
>> Thanks for help,
>>
>> Mauro
>>
>> _______________________________________________
>> nginx mailing list
>> nginx at nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx



More information about the nginx mailing list