Weird 0.8.11.1 connections spike
Jim Ohlstein
jim at ohlste.in
Mon Aug 31 06:55:57 MSD 2009
Igor Sysoev wrote:
> On Sun, Aug 30, 2009 at 11:52:51AM -0400, Jim Ohlstein wrote:
>
>
>>>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #1023 left in connection
>>>> 1015
>>>> 2009/08/30 10:29:00 [alert] 2042#0: aborting
>>>>
>>>> Other servers seem to be running fine including ones with busy sites.
>>>> For the moment I have reverted that VPS to 0.8.10.
>>>>
>>>>
>>> Could you do the following:
>>>
>>> 1) enable coredumps
>>> 2) set in nginx.conf:
>>> debug_points abort;
>>> 3) reconfigure nginx, if there are open connections, then nginx creates
>>> coredump on exit
>>>
>>>
>> Do you want nginx reconfigured "--with-debug" or is there another option
>> you need?
>>
>
> No. The coredump is enough, it's just should have debug info (gcc -g option).
>
>
>>> 4) look in log for alerts: open socket #... left in connection NN
>>> 5) run "gdb /path/to/nginx /path/to/core", then
>>>
>>> p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->uri
>>> p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->main->count
>>>
>>> where NN is NN from log message.
>>>
Unfortunately I don't think it gave too much information.
I watched connections gradually rise. I have ulimit -n set to 1024, two
workers, 1024 connections/worker. As connections neared 2048 the site
became unresponsive and load went up dramatically.
I began to see the same errors in the log. Nginx did not abort on its
own so I killed it after a few minutes. I then saw the same entries in
the error log like:
2009/08/30 22:22:40 [alert] 6118#0: open socket #980 left in connection 993
I ran gdb on the core but this was the output from three connections:
[root at mars proc]# gdb /vz/private/101/fs/root/usr/local/sbin/nginx ./kcore
GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host
libthread_db library "/lib64/libthread_db.so.1".
warning: core file may not match specified executable file.
Core was generated by `ro root=LABEL=/ console=tty0
console=ttyS1,19200n8 debug'.
#0 0x0000000000000000 in ?? ()
(gdb) p ((ngx_connection_t *) ngx_cycle->connections[1014]->data)->uri
Cannot access memory at address 0x130
(gdb) p ((ngx_connection_t *) ngx_cycle->connections[1014]->data)->uri
Cannot access memory at address 0x130
(gdb) p ((ngx_connection_t *) ngx_cycle->connections[1010]->data)->uri
Cannot access memory at address 0x130
(gdb) p ((ngx_connection_t *)
ngx_cycle->connections[1014]->data)->main->count
Cannot access memory at address 0x130
(gdb) p ((ngx_connection_t *)
ngx_cycle->connections[1010]->data)->main->count
Cannot access memory at address 0x130
(gdb) p ((ngx_connection_t *) ngx_cycle->connections[993]->data)->uri
Cannot access memory at address 0x130
(gdb) p ((ngx_connection_t *)
ngx_cycle->connections[993]->data)->main->count
Cannot access memory at address 0x130
(gdb) quit
[root at mars proc]#
During this time there were hundreds of connections in "CLOSE_WAIT"
state. They gradually increased to just over 1000 when it crashed.
Jim
More information about the nginx
mailing list