Weird 0.8.11.1 connections spike

Igor Sysoev is at rambler-co.ru
Mon Aug 31 09:24:13 MSD 2009


On Sun, Aug 30, 2009 at 10:55:57PM -0400, Jim Ohlstein wrote:

> Igor Sysoev wrote:
> >On Sun, Aug 30, 2009 at 11:52:51AM -0400, Jim Ohlstein wrote:
> >
> >  
> >>>>2009/08/30 10:29:00 [alert] 2042#0: open socket #1023 left in 
> >>>>connection 1015
> >>>>2009/08/30 10:29:00 [alert] 2042#0: aborting
> >>>>
> >>>>Other servers seem to be running fine including ones with busy sites. 
> >>>>For the moment I have reverted that VPS to 0.8.10.
> >>>>   
> >>>>        
> >>>Could you do the following:
> >>>
> >>>1) enable coredumps
> >>>2) set in nginx.conf:
> >>>  debug_points  abort;
> >>>3) reconfigure nginx, if there are open connections, then nginx creates
> >>>  coredump on exit
> >>> 
> >>>      
> >>Do you want nginx reconfigured "--with-debug" or is there another option 
> >>you need?
> >>    
> >
> >No. The coredump is enough, it's just should have debug info (gcc -g 
> >option).
> >
> >  
> >>>4) look in log for alerts: open socket #... left in connection NN
> >>>5) run "gdb /path/to/nginx /path/to/core", then
> >>>
> >>>  p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->uri
> >>>  p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->main->count
> >>>
> >>>  where NN is NN from log message.
> >>>      
> 
> Unfortunately I don't think it gave too much information.
> 
> I watched connections gradually rise. I have ulimit -n set to 1024, two 
> workers, 1024 connections/worker. As connections neared 2048 the site 
> became unresponsive and load went up dramatically.
> 
> I began to see the same errors in the log. Nginx did not abort on its 
> own so I killed it after a few minutes. I then saw the same entries in 
> the error log like:
> 
> 2009/08/30 22:22:40 [alert] 6118#0: open socket #980 left in connection 993

nginx aborts only when you send -HUP and it found leaked connections.

> I ran gdb on the core but this was the output from three connections:
> 
> [root at mars proc]# gdb /vz/private/101/fs/root/usr/local/sbin/nginx ./kcore
> GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
> Copyright (C) 2006 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...Using host 
> libthread_db library "/lib64/libthread_db.so.1".
> 
> warning: core file may not match specified executable file.
> Core was generated by `ro root=LABEL=/ console=tty0 
> console=ttyS1,19200n8 debug'.
> #0  0x0000000000000000 in ?? ()
> (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1014]->data)->uri
> Cannot access memory at address 0x130
> (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1014]->data)->uri
> Cannot access memory at address 0x130
> (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1010]->data)->uri
> Cannot access memory at address 0x130
> (gdb) p ((ngx_connection_t *) 
> ngx_cycle->connections[1014]->data)->main->count
> Cannot access memory at address 0x130
> (gdb)  p ((ngx_connection_t *) 
> ngx_cycle->connections[1010]->data)->main->count
> Cannot access memory at address 0x130
> (gdb) p ((ngx_connection_t *) ngx_cycle->connections[993]->data)->uri
> Cannot access memory at address 0x130
> (gdb) p ((ngx_connection_t *) 
> ngx_cycle->connections[993]->data)->main->count
> Cannot access memory at address 0x130
> (gdb) quit
> [root at mars proc]#
> 
> During this time there were hundreds of connections in "CLOSE_WAIT" 
> state. They gradually increased to just over 1000 when it crashed.

Sorry, I've mistaked:

p ((ngx_http_request_t *) ngx_cycle->connections[1014].data)->uri
p ((ngx_http_request_t *) ngx_cycle->connections[1014].data)->main->count


-- 
Igor Sysoev
http://sysoev.ru/en/





More information about the nginx mailing list