Dropped connections on our Nginx
Mauro Stettler
mauro.stettler at gmail.com
Wed Jun 20 11:57:13 UTC 2012
In the meantime I found that there is a log entry when this problem happens:
2012/06/20 13:14:33 [alert] 4340#0: *23199 http request count is zero
while sending to client, client: <ip>, server: <server>, request:
"POST /userProfile/updateVisitCountAndVisitHistory/user_id/<id>/gender/w/nickname/geilpopperrin/membership/normal/hash/bb7e2305b293458dc8e9043486a6a455
HTTP/1.1", upstream: "fastcgi://10.20.0.159:9000", host: "<host>",
referrer: "<referrer>"
If I understand that error right, it means that when Nginx tried to
build the reply to the client, it somehow lost the context to which
request this is associated to? As far as I see the error is outputted
from this part of the source:
src/http/ngx_http_request.c:
2983 if (r->count == 0) {
2984 ngx_log_error(NGX_LOG_ALERT, c->log, 0, "http request
count is zero");
2985 }
What would be a possible reason why the r->count is 0? Do I have to
assume that there is a bug in one of the 3th party modules which are
used?
Mauro
On Wed, Jun 20, 2012 at 1:08 PM, Mauro Stettler
<mauro.stettler at gmail.com> wrote:
> its set to a high value:
>
> rsid-a-57:/etc/nginx # ulimit -n
> 524288
>
> i think if the problem would be the fs ulimit, i should see some error
> related to that no more files can be opened. but there is no such
> error.
>
>
>
> On Wed, Jun 20, 2012 at 12:54 PM, Payam Chychi <pchychi at gmail.com> wrote:
>> whats ur fs ulimit set to?
>>
>>
>> Sent from my iPhone
>>
>> On 2012-06-19, at 9:52 PM, Mauro Stettler <mauro.stettler at gmail.com> wrote:
>>
>>> Hi list
>>>
>>> I have a problem with dropped connections on an Nginx cluster that has
>>> up to 100k requests per minute per Nginx instance. It seems that in
>>> around 1 of 10`000 requests that are sent to our Nginx, the TCP
>>> connection just gets reset by the server. At first I was guessing that
>>> some values in the /etc/sysctl.conf are maybe causing this problem,
>>> because we have modified multiple TCP related values there. But after
>>> resetting all of them to the default, the connection resets still kept
>>> happening.
>>>
>>> I am guessing the problem must be related to Nginx and not to a kernel
>>> setting because in our traffic only around 25% of all requests are
>>> POSTs and the rest are GETs, but more than 90% of the requests where
>>> the problem appears are POSTs. I don't think that the kernel can be
>>> aware if a request is POST or GET.
>>>
>>> The problem happens on many different URLs, mostly ones where we POST
>>> to, so it does not seem to be related to any rewrite rules.
>>>
>>> I have tcpdumped the problem and I can see that the request was sent
>>> correctly by the client. But after the request was received by the
>>> Nginx, it only sends back a packet with the ACK and FIN flags set. So
>>> the connection gets killed and most of the browsers display some empty
>>> pages or "zero sized reply" errors. The fact that the FIN is sent by
>>> the server makes me assume that the problem cannot be related to
>>> network hardware. Also we have this problem on all Nginx instances
>>> inside that cluster, so I don't think its related to broken networking
>>> hardware.
>>>
>>> When the problem happens, i see statements like this one in the access
>>> log. As you can see the Nginx says HTTP status code and length both
>>> are 0:
>>> <ip> - - [20/Jun/2012:04:13:23 +0200] "POST
>>> /userProfile/rateResult?userId=<id>&_csrf_token=7e23ef60c67800c4765d32b0536fc536&rate=5
>>> HTTP/1.1" 0 0 "<referer>" "Mozilla/5.0 (X11; U; Linux x86_64; en-US;
>>> rv:1.9.1.6) Gecko/20091216 Mandriva Linux/1.9.1.6-0.1mdv2010.0
>>> (2010.0) Firefox/3.5.6"
>>>
>>> What i also find very interesting is that the problem can happen at
>>> any time, so it does not seem to be related to the load or number of
>>> requests on the Nginx. In the morning hours we have less than 5% of
>>> the traffic of the evening hours, and still I sometimes see this
>>> problem appearing in the morning.
>>>
>>> My Nginx config is very long, so its too long to post it here. So I
>>> only post the parts which i think might be important, without all the
>>> rewrite rules:
>>>
>>> user wwwrun www;
>>> worker_processes 64;
>>> worker_rlimit_nofile 524288;
>>>
>>> events {
>>> worker_connections 32768;
>>> use epoll;
>>> multi_accept on;
>>> }
>>>
>>> http {
>>> sendfile on;
>>> tcp_nopush on;
>>> keepalive_requests 0;
>>> recursive_error_pages on;
>>> large_client_header_buffers 4 16k;
>>>
>>> What I also found via tcpdump is that on the requests where this
>>> problem appears, the Nginx receives the incoming request and then
>>> sends the correct request to the FastCGI backend and also receives the
>>> correct answer from the backend, but before the answer from the
>>> backend comes back (less than 300ms), it already resets the client's
>>> connection.
>>>
>>> Just in case this matters anyhow, this is my sysctl.conf:
>>>
>>> net.ipv4.icmp_echo_ignore_broadcasts = 1
>>> net.ipv4.conf.all.rp_filter = 1
>>> fs.inotify.max_user_watches = 65536
>>> net.ipv4.conf.default.promote_secondaries = 1
>>> net.ipv4.conf.all.promote_secondaries = 1
>>> net.ipv4.ip_forward = 0
>>> net.ipv4.conf.lo.arp_ignore = 1
>>> net.ipv4.conf.lo.arp_announce = 2
>>> net.ipv4.conf.all.arp_ignore = 1
>>> net.ipv4.conf.all.arp_announce = 2
>>> net.netfilter.nf_conntrack_max = 262144
>>> net.nf_conntrack_max = 262144
>>> net.ipv4.tcp_max_syn_backlog = 30000
>>> net.ipv4.tcp_max_tw_buckets = 2000000
>>> net.core.netdev_max_backlog = 50000
>>> net.ipv4.tcp_tw_reuse = 0
>>> net.ipv4.tcp_tw_recycle = 1
>>> net.ipv4.tcp_fin_timeout = 3
>>> net.ipv4.tcp_keepalive_time = 120
>>> net.core.wmem_max = 8388608
>>> net.core.rmem_max = 8388608
>>> net.ipv4.tcp_rmem = 4096 87380 8388608
>>> net.ipv4.tcp_wmem = 4096 87380 8388608
>>> net.core.somaxconn = 1024
>>> kernel.pid_max = 65536
>>> net.ipv4.conf.all.log_martians = 0
>>> net.ipv4.conf.default.log_martians = 0
>>> net.ipv4.conf.lo.log_martians = 0
>>> net.ipv4.conf.eth0.log_martians = 0
>>> net.ipv4.conf.eth1.log_martians = 0
>>>
>>> Our operating system is SuSE Linux Enterprise 11.0. The Nginx
>>> configure params are the following:
>>>
>>> nginx version: nginx/1.2.1
>>> built by gcc 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux)
>>> configure arguments: --prefix=/usr/local/nginx-1.2.1
>>> --error-log-path=/var/log/nginx/error.log
>>> --http-log-path=/var/log/nginx/access.log
>>> --with-http_stub_status_module --without-http_autoindex_module
>>> --without-http_geo_module --without-http_map_module
>>> --without-http_referer_module --without-http_limit_conn_module
>>> --without-http_empty_gif_module --without-mail_pop3_module
>>> --without-mail_imap_module --without-mail_smtp_module
>>> --with-http_geoip_module --with-pcre=/usr/local/src/nginx/pcre-8.30
>>> --add-module=3rd/agentzh-nginx-eval-module-4eb2a02
>>> --add-module=3rd/ngx_http_log_request_speed
>>> --add-module=3rd/replay-ngx_http_generate_secure_download_links-4c1a46a
>>> --add-module=3rd/agentzh-memc-nginx-module-8befc56
>>> --add-module=3rd/agentzh-echo-nginx-module-080c0a1
>>> --add-module=3rd/replay-ngx_http_php_memcache_standard_balancer-4f7dcba
>>> --add-module=3rd/masterzen-nginx-upload-progress-module-a788dea
>>> --add-module=3rd/replay-ngx_http_php_session-30f69b3
>>> --add-module=3rd/simpl-ngx_devel_kit-24202b4
>>> --add-module=3rd/chaoslawful-lua-nginx-module-c5be5ff
>>> --add-module=3rd/replay-ngx_http_lower_upper_case-44958e0
>>> --add-module=3rd/gnosek-nginx-upstream-fair-a18b409
>>>
>>> In the dmesg I cannot see anything suspicious, there are no segfaults
>>> or related networking messages.
>>>
>>> I have already tried setting the Nginx error log to some high log
>>> level, but I didn't see anything related to my problem, even at times
>>> when I saw that the problem is happening.
>>>
>>> Now I don't really know what else to check anymore... I would be
>>> really glad if somebody had some ideas?
>>>
>>> Thanks for help,
>>>
>>> Mauro
>>>
>>> _______________________________________________
>>> nginx mailing list
>>> nginx at nginx.org
>>> http://mailman.nginx.org/mailman/listinfo/nginx
>>
>> _______________________________________________
>> nginx mailing list
>> nginx at nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx
More information about the nginx
mailing list