All workers in 'D' state using sendfile

Tue May 29 04:50:12 UTC 2012

Hi Ryan,

If I comment out use epoll;, it still seems to use epoll according to
strace.  These are the compile flags I use:

./configure --prefix=/usr/local/nginx-1.2.0 --with-pcre
--add-module=/usr/local/src/nginx_upload_module-2.2.0
--with-http_stub_status_module --with-file-aio --without-http_proxy_module
--without-http_memcached_module --without-http_gzip_module
--without-http_ssi_module --without-http_userid_module
--without-http_autoindex_module --without-http_geo_module
--without-http_map_module --without-http_empty_gif_module
--without-http_browser_module --without-http_upstream_ip_hash_module
--without-http_charset_module

This is what I'm running with now:

events {
        worker_connections      51200;
#       use                     epoll;
#       multi_accept            on;
}

http {
        include                         mime.types;
        default_type                    application/octet-stream;

        server_tokens                   off;
        sendfile                        off;
        tcp_nopush                      on;
        tcp_nodelay                     on;
        keepalive_timeout               10;

        aio                             on;
        directio                        4k;
        output_buffers                  1 512k;
        max_ranges                      5;
        ...

Cheers,

Drew

On Tue, May 29, 2012 at 12:31 AM, Ryan Brown <mp3geek at gmail.com> wrote:

> If you comment/remove "use        epoll;"  does that help?
>
>
> On Mon, May 28, 2012 at 10:06 PM, 姚伟斌 <nbubingo at gmail.com> wrote:
> > Hi Maxim,
> >
> > Is there any planning to develop the thread version?
> >
> > 2012/5/12 Maxim Dounin <mdounin at mdounin.ru>:
> >> Hello!
> >>
> >> On Sat, May 12, 2012 at 08:28:14PM +1000, Drew Wareham wrote:
> >>
> >>> Hello,
> >>>
> >>> I have tried to summarize this as much as possible but it's still a
> lot of
> >>> text.  I apologize but wanted to make sure that I provide enough
> >>> information to explain the issue properly.
> >>>
> >>> I'm hoping that somebody that uses nginx as a high traffic/concurrency
> >>> download server will be able to shed some light on this issue.  I've
> tried
> >>> as many things as I can think of and everything keeps pointing to it
> being
> >>> an issue with nginx, not the server - but I am of course more than
> willing
> >>> to try any suggestions provided.
> >>>
> >>> *Background:*
> >>> Approx. 1,500 - 5,000 concurrent connections (peak / off-peak),
> >>> Files vary in size from 5MB to 2GB,
> >>> All downloads; only very small dynamic content scripts run on these
> servers
> >>> and none take more than 1-3 seconds,
> >>> File are hosted on direct-attached AoE storage with a dedicated 10GE
> link,
> >>> Server is running nginx-1.0.11, php-fpm 5.3 and CentOS 5.8x64
> >>> (2.6.18-308.4.1.el5.centos.plus).
> >>> Specs are: Dual Xeon E5649 (6 Core), 32GB RAM, 300GB 10k SAS HDD, AoE
> DAS
> >>> over 10GE
> >>> Download speeds are restricted by the PHP handoff using
> X-Accel-Redirect,
> >>> but obviously not when I'm testing ;)
> >>>
> >>> *Issue:*
> >>> After running for a short, but random period of time (5min ~ 90min) all
> >>> nginx workers will eventually end up in a 'D' state according to
> ps/top.
> >>> This causes all downloads to run extremely slowly (~25kb/s) but it
> doesn't
> >>> seem to be caused by I/O because an scp of the same file will complete
> at
> >>> the expected speed of ~750MB+/s.
> >>>
> >>> I usually run with worker_processes set to 13, but I've had to raise
> this
> >>> to 50 to prevent the issue.  This works short term, but I'm guessing
> >>> eventually I will need to restart nginx to fix it.
> >>>
> >>> *Config:*
> >>> I'm using sendfile with epoll, and using the following events / http
> >>> settings (I've removed the location block with the fastcgi handler,
> etc):
> >>
> >> With rotational disks you have to optimize iops to minimize seeks.
> >> This includes:
> >>
> >> 1. Switch off sendfile, it works bad on such workloads under linux
> >> due to no ability to control readahead (and hence blocks read from
> >> disk).
> >>
> >> 2. Use large output buffers, something like
> >>
> >>    output_buffers 1 512k
> >>
> >> would be a good starting point.
> >>
> >> 3. Try using aio to ensure better disk concurrency (and note under
> >> linux it needs directio as well), i.e. something like this
> >>
> >>    aio on;
> >>    directio 512;
> >>
> >> (this will require newer kernel though, but using 2.6.18 nowadays
> >> looks like bad idea, at least if you need speed)
> >>
> >> 4. Try tuning io scheduler, there have been reports that deadline
> >> might be better for such workloads.
> >>
> >> More details can be found here:
> >>
> >> http://nginx.org/r/output_buffers
> >> http://nginx.org/r/aio
> >> http://nginx.org/r/directio
> >>
> >> Maxim Dounin
> >>
> >> _______________________________________________
> >> nginx mailing list
> >> nginx at nginx.org
> >> http://mailman.nginx.org/mailman/listinfo/nginx
> >
> > _______________________________________________
> > nginx mailing list
> > nginx at nginx.org
> > http://mailman.nginx.org/mailman/listinfo/nginx
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20120529/32b95ed1/attachment.html>