Weird 0.8.11.1 connections spike

Jim Ohlstein jim at ohlste.in
Mon Aug 31 16:14:12 MSD 2009


Igor Sysoev wrote:
> On Sun, Aug 30, 2009 at 10:55:57PM -0400, Jim Ohlstein wrote:
>
>   
>> Igor Sysoev wrote:
>>     
>>> On Sun, Aug 30, 2009 at 11:52:51AM -0400, Jim Ohlstein wrote:
>>>
>>>  
>>>       
>>>>>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #1023 left in 
>>>>>> connection 1015
>>>>>> 2009/08/30 10:29:00 [alert] 2042#0: aborting
>>>>>>
>>>>>> Other servers seem to be running fine including ones with busy sites. 
>>>>>> For the moment I have reverted that VPS to 0.8.10.
>>>>>>   
>>>>>>        
>>>>>>             
>>>>> Could you do the following:
>>>>>
>>>>> 1) enable coredumps
>>>>> 2) set in nginx.conf:
>>>>>  debug_points  abort;
>>>>> 3) reconfigure nginx, if there are open connections, then nginx creates
>>>>>  coredump on exit
>>>>>
>>>>>      
>>>>>           
>>>> Do you want nginx reconfigured "--with-debug" or is there another option 
>>>> you need?
>>>>    
>>>>         
>>> No. The coredump is enough, it's just should have debug info (gcc -g 
>>> option).
>>>
>>>  
>>>       
>>>>> 4) look in log for alerts: open socket #... left in connection NN
>>>>> 5) run "gdb /path/to/nginx /path/to/core", then
>>>>>
>>>>>  p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->uri
>>>>>  p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->main->count
>>>>>
>>>>>  where NN is NN from log message.
>>>>>      
>>>>>           
>> Unfortunately I don't think it gave too much information.
>>
>> I watched connections gradually rise. I have ulimit -n set to 1024, two 
>> workers, 1024 connections/worker. As connections neared 2048 the site 
>> became unresponsive and load went up dramatically.
>>
>> I began to see the same errors in the log. Nginx did not abort on its 
>> own so I killed it after a few minutes. I then saw the same entries in 
>> the error log like:
>>
>> 2009/08/30 22:22:40 [alert] 6118#0: open socket #980 left in connection 993
>>     
>
> nginx aborts only when you send -HUP and it found leaked connections.
>
>   
>> I ran gdb on the core but this was the output from three connections:
>>
>> [root at mars proc]# gdb /vz/private/101/fs/root/usr/local/sbin/nginx ./kcore
>> GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
>> Copyright (C) 2006 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain 
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu"...Using host 
>> libthread_db library "/lib64/libthread_db.so.1".
>>
>> warning: core file may not match specified executable file.
>> Core was generated by `ro root=LABEL=/ console=tty0 
>> console=ttyS1,19200n8 debug'.
>> #0  0x0000000000000000 in ?? ()
>> (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1014]->data)->uri
>> Cannot access memory at address 0x130
>> (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1014]->data)->uri
>> Cannot access memory at address 0x130
>> (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1010]->data)->uri
>> Cannot access memory at address 0x130
>> (gdb) p ((ngx_connection_t *) 
>> ngx_cycle->connections[1014]->data)->main->count
>> Cannot access memory at address 0x130
>> (gdb)  p ((ngx_connection_t *) 
>> ngx_cycle->connections[1010]->data)->main->count
>> Cannot access memory at address 0x130
>> (gdb) p ((ngx_connection_t *) ngx_cycle->connections[993]->data)->uri
>> Cannot access memory at address 0x130
>> (gdb) p ((ngx_connection_t *) 
>> ngx_cycle->connections[993]->data)->main->count
>> Cannot access memory at address 0x130
>> (gdb) quit
>> [root at mars proc]#
>>
>> During this time there were hundreds of connections in "CLOSE_WAIT" 
>> state. They gradually increased to just over 1000 when it crashed.
>>     
>
> Sorry, I've mistaked:
>
> p ((ngx_http_request_t *) ngx_cycle->connections[1014].data)->uri
> p ((ngx_http_request_t *) ngx_cycle->connections[1014].data)->main->count
>
>
>   
It looks as though you got the data that you needed overnight in my time 
zone. That server does use a try_files directive:

location /forums/ {
    try_files  $uri  $uri/  /forums/vbseo.php;
    ...
}

Previously we used a rewrite:

#if (!-e $request_filename) {
#rewrite ^/forums/(.*)$ /forums/vbseo.php last;
#}

which ironically would probably not have caused this difficulty.

I'll try 0.8.12 and report if any difficulties unless you want me to 
generate another coredump with 0.8.11

Jim








More information about the nginx mailing list