Lots of CLOSE_WAIT sockets, nginx+php (WordPress site)

Vicente Aguilar bisente at bisente.com
Thu Feb 25 16:57:17 MSK 2010


Hi

>> From the output you provided it looks like all nginx workers are 
> locked out, either doing something or waiting for some system 
> resources.  As you can see - all connections accepted by nginx (6 
> connections which have nginx process listed in pid column) are in 
> CLOSE_WAIT state, and there are other connections to port 80 which 
> are sitting in listen queue.  Am I right in the assumption that 
> nginx does not answer any requests?

Yes, that's the issue. nginx becomes unresponsive at this point until I restart it.

> Note well: you haven't posted full config you use, so please check 
> yourself for possible loops in it.  I've recently posted some 
> patches which take care of several loops which aren't automatically 
> resolved now, see here for patch and example loops:
> 
> http://nginx.org/pipermail/nginx-devel/2010-January/000099.html
> 
> It should be trivial to find if it's the cause though, as nginx 
> worker will eat 100% cpu once caught in such loop.

I have a monitoring script that detects these situations (wget can't download from localhost with a 20s timeout) and restarts nginx, but before that it captures a netstat -nap, ps and other system metrics. This is an example of what ps shows:

www-data 24610  0.0  0.1   7476  2452 ?        S    07:44   0:00 nginx: worker process
www-data 24611  0.0  0.1   7668  2412 ?        S    07:44   0:00 nginx: worker process
www-data 24612  0.0  0.1   7668  2416 ?        S    07:44   0:00 nginx: worker process
www-data 24613  0.0  0.1   7736  2624 ?        S    07:44   0:00 nginx: worker process

And vmstat:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0    440 157012 181076 1180340    0    0     2    32   27   46  2  0 95  0
 0  0    440 156904 181076 1180340    0    0     0     0   26   28  2  0 94  0
 0  0    440 156888 181076 1180348    0    0     0     0   13   24  0  0 100  0
 0  0    440 156888 181076 1180348    0    0     0     0   12   21  0  0 100  0
 0  0    440 156888 181080 1180348    0    0     0   128   22   34  0  0 99  1

So the nginx processes don't seem to be in a loop, CPU use is negligible.

> Note well 2: I've already asked you to try compiling without third 
> party modules and patches and check if you are able to reproduce 
> the problem.  It doesn't really make sense to proceed any further 
> without doing this.

I have to admit I still haven't tried this, sorry. :) Will try.

> You have to enable debug log (see 
> http://nginx.org/en/docs/debugging_log.html).  Then it will be 
> possible to map fd number to the particular request (and it's full 
> logs).  Under linux it should be possible to find out fd number of 
> the particular connection via lsof -p <pid-of-nginx-worker>.

Will look into this too and get that info on the monitoring script. Can you think of any other system parameter that can be useful to monitor in these cases?

Thanks a lot Maxim. You're being really helpful. :-)

Regards

-- 
 Vicente Aguilar <bisente at bisente.com> | http://www.bisente.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://nginx.org/pipermail/nginx/attachments/20100225/05da225c/attachment.html>


More information about the nginx mailing list