All workers in 'D' state using sendfile

Drew Wareham m3rlin at gmail.com
Sat May 12 10:28:14 UTC 2012


Hello,

I have tried to summarize this as much as possible but it's still a lot of
text.  I apologize but wanted to make sure that I provide enough
information to explain the issue properly.

I'm hoping that somebody that uses nginx as a high traffic/concurrency
download server will be able to shed some light on this issue.  I've tried
as many things as I can think of and everything keeps pointing to it being
an issue with nginx, not the server - but I am of course more than willing
to try any suggestions provided.

*Background:*
Approx. 1,500 - 5,000 concurrent connections (peak / off-peak),
Files vary in size from 5MB to 2GB,
All downloads; only very small dynamic content scripts run on these servers
and none take more than 1-3 seconds,
File are hosted on direct-attached AoE storage with a dedicated 10GE link,
Server is running nginx-1.0.11, php-fpm 5.3 and CentOS 5.8x64
(2.6.18-308.4.1.el5.centos.plus).
Specs are: Dual Xeon E5649 (6 Core), 32GB RAM, 300GB 10k SAS HDD, AoE DAS
over 10GE
Download speeds are restricted by the PHP handoff using X-Accel-Redirect,
but obviously not when I'm testing ;)

*Issue:*
After running for a short, but random period of time (5min ~ 90min) all
nginx workers will eventually end up in a 'D' state according to ps/top.
This causes all downloads to run extremely slowly (~25kb/s) but it doesn't
seem to be caused by I/O because an scp of the same file will complete at
the expected speed of ~750MB+/s.

I usually run with worker_processes set to 13, but I've had to raise this
to 50 to prevent the issue.  This works short term, but I'm guessing
eventually I will need to restart nginx to fix it.

*Config:*
I'm using sendfile with epoll, and using the following events / http
settings (I've removed the location block with the fastcgi handler, etc):

events {
    worker_connections      16384;
    use                     epoll;
}

http {
    ....

    sendfile                        on;
    tcp_nopush                      on;
    tcp_nodelay                     on;
    keepalive_timeout               0;

    ....

    location /internalAccess/ {
        internal;
        alias                   /data/;
    }
}



Kind Regards,

Drew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20120512/cb792d7f/attachment.html>


More information about the nginx mailing list