Weird 0.8.11.1 connections spike

Sun Aug 30 19:52:51 MSD 2009

Igor Sysoev wrote:
> On Sun, Aug 30, 2009 at 11:00:10AM -0400, Jim Ohlstein wrote:
>
>   
>> Igor Sysoev wrote:
>>     
>>> On Sun, Aug 30, 2009 at 01:45:30AM +1000, Jeff Waugh wrote:
>>>
>>>  
>>>       
>>>> <quote who="Igor Sysoev">
>>>>
>>>>    
>>>>         
>>>>>> 2009/08/29 23:15:52 [alert] 8716#0: open socket #152 left in connection 
>>>>>> 105
>>>>>>        
>>>>>>             
>>>>> This is probably after reconfiguraiton.
>>>>>      
>>>>>           
>>>> Sorry, yes, I should've checked the time -> that was from when I reverted 
>>>> to
>>>> 0.8.10... there no alerts prior to that.
>>>>
>>>>    
>>>>         
>>>>> There are signinficant changes in 0.8.11, but you case is strange: I'am
>>>>> testing 0.8.11 on a host with 400r/s with a lot of proxied SSIed
>>>>> subrequests and static files, and there is no connection leak.
>>>>>
>>>>> Have you built any third party modules ?
>>>>>      
>>>>>           
>>>> None.
>>>>
>>>> btw, the majority of my non-static traffic is fastcgi over unix socket. 
>>>> I'm
>>>> going to do the upgrade again and watch the activity.
>>>>    
>>>>         
>>> Could you do the following:
>>>
>>> 1) enable coredumps
>>> 2) set in nginx.conf:
>>>   debug_points  abort;
>>> 3) reconfigure nginx, if there are open connections, then nginx creates
>>>   coredump on exit
>>> 4) look in log for alerts: open socket #... left in connection NN
>>> 5) run "gdb /path/to/nginx /path/to/core", then
>>>
>>>   p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->uri
>>>   p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->main->count
>>>
>>>   do not forget to set NN from log message.
>>>
>>>
>>>  
>>>       
>> Igor,
>>
>> On one server I seem to be seeing the same issue. Twice since upgrading 
>> to 0.8.11 I have found the site unresponsive. It's a relatively slow 
>> vBulletin site on a very small VPS. We serve a fair bit of static 
>> content (photos mostly) and PHP via fastcgi.
>>
>> OS is CentOS 5.3 Linux  - 2.6.18-028stab062.3
>>
>> sample of error log (it grew to 36 GB in a few hours):
>>
>> 2009/08/30 07:50:20 [alert] 2043#0: accept() failed (24: Too many open 
>> files)
>> 2009/08/30 07:50:20 [alert] 2043#0: accept() failed (24: Too many open 
>> files)
>> 2009/08/30 07:50:20 [alert] 2043#0: accept() failed (24: Too many open 
>> files)
>> 2009/08/30 07:50:20 [alert] 2043#0: accept() failed (24: Too many open 
>> files)
>> 2009/08/30 07:50:20 [alert] 2043#0: accept() failed (24: Too many open 
>> files)
>> 2009/08/30 07:50:20 [alert] 2043#0: accept() failed (24: Too many open 
>> files)
>>
>> ...
>>
>> 2009/08/30 10:29:00 [alert] 2042#0: accept() failed (24: Too many open 
>> files)
>> 2009/08/30 10:29:00 [alert] 2042#0: accept() failed (24: Too many open 
>> files)
>> 2009/08/30 10:29:00 [alert] 2042#0: accept() failed (24: Too many open 
>> files)
>> 2009/08/30 10:29:00 [alert] 2042#0: accept() failed (24: Too many open 
>> files)
>> 2009/08/30 10:29:00 [alert] 2042#0: accept() failed (24: Too many open 
>> files)
>> 2009/08/30 10:29:00 [alert] 2042#0: accept() failed (24: Too many open 
>> files)
>> 2009/08/30 10:29:00 [alert] 2042#0: accept() failed (24: Too many open 
>> files)
>> 2009/08/30 10:29:00 [alert] 2042#0: accept() failed (24: Too many open 
>> files)
>> 2009/08/30 10:29:00 [alert] 2042#0: accept() failed (24: Too many open 
>> files)
>>
>> When I did shut it down I saw hundreds of lines like:
>>
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #1010 left in connection 
>> 1002
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #981 left in connection 1003
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #994 left in connection 1004
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #1017 left in connection 
>> 1005
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #1012 left in connection 
>> 1006
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #1011 left in connection 
>> 1007
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #1016 left in connection 
>> 1008
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #1014 left in connection 
>> 1009
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #1019 left in connection 
>> 1010
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #1020 left in connection 
>> 1011
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #1013 left in connection 
>> 1012
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #995 left in connection 1013
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #988 left in connection 1014
>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #1023 left in connection 
>> 1015
>> 2009/08/30 10:29:00 [alert] 2042#0: aborting
>>
>> Other servers seem to be running fine including ones with busy sites. 
>> For the moment I have reverted that VPS to 0.8.10.
>>     
>
> Could you do the following:
>
> 1) enable coredumps
> 2) set in nginx.conf:
>    debug_points  abort;
> 3) reconfigure nginx, if there are open connections, then nginx creates
>    coredump on exit
>   

Do you want nginx reconfigured "--with-debug" or is there another option 
you need?

> 4) look in log for alerts: open socket #... left in connection NN
> 5) run "gdb /path/to/nginx /path/to/core", then
>
>    p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->uri
>    p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->main->count
>
>    where NN is NN from log message.
>
>
>   
Jim