All workers in 'D' state using sendfile

Mon May 28 14:31:40 UTC 2012

If you comment/remove "use        epoll;"  does that help?

On Mon, May 28, 2012 at 10:06 PM, 姚伟斌 <nbubingo at gmail.com> wrote:
> Hi Maxim,
>
> Is there any planning to develop the thread version?
>
> 2012/5/12 Maxim Dounin <mdounin at mdounin.ru>:
>> Hello!
>>
>> On Sat, May 12, 2012 at 08:28:14PM +1000, Drew Wareham wrote:
>>
>>> Hello,
>>>
>>> I have tried to summarize this as much as possible but it's still a lot of
>>> text.  I apologize but wanted to make sure that I provide enough
>>> information to explain the issue properly.
>>>
>>> I'm hoping that somebody that uses nginx as a high traffic/concurrency
>>> download server will be able to shed some light on this issue.  I've tried
>>> as many things as I can think of and everything keeps pointing to it being
>>> an issue with nginx, not the server - but I am of course more than willing
>>> to try any suggestions provided.
>>>
>>> *Background:*
>>> Approx. 1,500 - 5,000 concurrent connections (peak / off-peak),
>>> Files vary in size from 5MB to 2GB,
>>> All downloads; only very small dynamic content scripts run on these servers
>>> and none take more than 1-3 seconds,
>>> File are hosted on direct-attached AoE storage with a dedicated 10GE link,
>>> Server is running nginx-1.0.11, php-fpm 5.3 and CentOS 5.8x64
>>> (2.6.18-308.4.1.el5.centos.plus).
>>> Specs are: Dual Xeon E5649 (6 Core), 32GB RAM, 300GB 10k SAS HDD, AoE DAS
>>> over 10GE
>>> Download speeds are restricted by the PHP handoff using X-Accel-Redirect,
>>> but obviously not when I'm testing ;)
>>>
>>> *Issue:*
>>> After running for a short, but random period of time (5min ~ 90min) all
>>> nginx workers will eventually end up in a 'D' state according to ps/top.
>>> This causes all downloads to run extremely slowly (~25kb/s) but it doesn't
>>> seem to be caused by I/O because an scp of the same file will complete at
>>> the expected speed of ~750MB+/s.
>>>
>>> I usually run with worker_processes set to 13, but I've had to raise this
>>> to 50 to prevent the issue.  This works short term, but I'm guessing
>>> eventually I will need to restart nginx to fix it.
>>>
>>> *Config:*
>>> I'm using sendfile with epoll, and using the following events / http
>>> settings (I've removed the location block with the fastcgi handler, etc):
>>
>> With rotational disks you have to optimize iops to minimize seeks.
>> This includes:
>>
>> 1. Switch off sendfile, it works bad on such workloads under linux
>> due to no ability to control readahead (and hence blocks read from
>> disk).
>>
>> 2. Use large output buffers, something like
>>
>>    output_buffers 1 512k
>>
>> would be a good starting point.
>>
>> 3. Try using aio to ensure better disk concurrency (and note under
>> linux it needs directio as well), i.e. something like this
>>
>>    aio on;
>>    directio 512;
>>
>> (this will require newer kernel though, but using 2.6.18 nowadays
>> looks like bad idea, at least if you need speed)
>>
>> 4. Try tuning io scheduler, there have been reports that deadline
>> might be better for such workloads.
>>
>> More details can be found here:
>>
>> http://nginx.org/r/output_buffers
>> http://nginx.org/r/aio
>> http://nginx.org/r/directio
>>
>> Maxim Dounin
>>
>> _______________________________________________
>> nginx mailing list
>> nginx at nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx