Nginx stops sending file after ~1.5mb ?

Yo'av Moshe bjesus at gmail.com
Tue Jan 27 11:42:57 MSK 2009


Note I just compiled Cherokee 0.98. It is serving the file as it should
using sendfile and epoll.

On Mon, Jan 26, 2009 at 2:05 PM, Yo'av Moshe <bjesus at gmail.com> wrote:

> Hey, attached is my tcpdump log.
> I've never used tcpdump before, so I hope the log is fine.
> I used grep to show only the connections from my computer while running
> 'wget http://www.noal.org.il/static/temp/hozer.pdf'.
>
> Hope this will lead us somewhere...
>
> Yo'av.
>
>
> On Sun, Jan 25, 2009 at 9:21 PM, Igor Sysoev <is at rambler-co.ru> wrote:
>
>> On Sun, Jan 25, 2009 at 09:04:54PM +0200, Yo'av Moshe wrote:
>>
>> > Sendfile is now off, using select, and still the same...
>> > http://pastebin.com/m2c3ea25b
>>
>> I have no idea, why it may be so.
>>
>> I see that select() returned event ("out [19]"):
>>
>> 28215 20:50:18 select(20, [10 19], [19], NULL, {0, 500000} <unfinished
>> ...>
>> 28215 20:50:18 <... select resumed> )   = 1 (out [19], left {0, 380000})
>>
>> However, at some stage select() did not return event, although the event
>> is still active ("select(20, [10 19], [19], NULL ..."):
>>
>> 28215 20:50:18 gettimeofday({1232909418, 293895}, NULL) = 0
>> 28215 20:50:18 writev(19,
>> [{"\227\1\2\1\3\1\4\1\5\0\373\0\374\1\230\1\231\1\232\1\233\0\375\0\376\1\6\1\7\1\10\0"...,
>> 9676}], 1) = 9676
>> 28215 20:50:18 pread64(20,
>> "\24\7\6#\"\'&\0214\22$63\1\6\25\24\26\27\02632654&#\"\6\3\226\204"...,
>> 32768, 1605632) = 32768
>> 28215 20:50:18 writev(19,
>> [{"\24\7\6#\"\'&\0214\22$63\1\6\25\24\26\27\02632654&#\"\6\3\226\204"...,
>> 32768}], 1) = 32768
>> 28215 20:50:18 pread64(20, "\0 \0M\0o\0r\0i\0s\0o\0n\0\'\0s\0
>> \0d\0i\0r\0e\0c"..., 32768, 1638400) = 32768
>> 28215 20:50:18 writev(19, [{"\0 \0M\0o\0r\0i\0s\0o\0n\0\'\0s\0
>> \0d\0i\0r\0e\0c"..., 32768}], 1) = 1716
>> 28215 20:50:18 select(20, [10 19], [19], NULL, {0, 500000} <unfinished
>> ...>
>> 28215 20:50:18 <... select resumed> )   = 0 (Timeout)
>>
>> Could you run tcpdump of the session ?
>>
>> > Yo'av
>> >
>> > On Sun, Jan 25, 2009 at 7:33 PM, Igor Sysoev <is at rambler-co.ru> wrote:
>> >
>> > > On Sun, Jan 25, 2009 at 06:26:58PM +0200, Yo'av Moshe wrote:
>> > >
>> > > > Igor, I tried using select instead of poll, but I get the exact same
>> > > > problem.
>> > > >
>> > > > Check my strace:
>> > > > http://pastebin.com/m65056ec0
>> > > >
>> > > > I get the same thing with rtsig too.
>> > > >
>> > > > Hints?
>> > > > Again, file is at http://www.noal.org.il/static/temp/hozer.pdf.
>> Server
>> > > is
>> > > > using select now.
>> > >
>> > > According strace select() does not return event for socket 19 after
>> > > sendfile64() sent 1651005 bytes:
>> > >
>> > > 25951 18:14:26 select(20, [7 8 16 19], [19], NULL, {60, 0} <unfinished
>> ...>
>> > > 25951 18:14:27 <... select resumed> )   = 1 (out [19], left {58,
>> 790000})
>> > >
>> > > 25951 18:14:27 gettimeofday({1232900067, 890376}, NULL) = 0
>> > > 25951 18:14:27 sendfile64(19, 20, [1509465], 440765 <unfinished ...>
>> > > 25951 18:14:27 <... sendfile64 resumed> ) = 141540
>> > > 25951 18:14:27 sendfile64(19, 20, [1651005], 299225 <unfinished ...>
>> > > 25951 18:14:27 <... sendfile64 resumed> ) = -1 EAGAIN (Resource
>> temporarily
>> > > unavailable)
>> > >
>> > > 25951 18:14:27 select(20, [16 19], [19], NULL, {0, 500000} <unfinished
>> ...>
>> > > 25951 18:14:28 <... select resumed> )   = 0 (Timeout)
>> > >
>> > > As you have got the same issue with all methods (even with level
>> triggered
>> > > select()), I believe that the bug is in sendfile. You may try to turn
>> > > it off even for the single file:
>> > >
>> > >    location = /static/temp/hozer.pdf {
>> > >        sendfile off;
>> > >    }
>> > >
>> > > In Linux 2.6.23 sendfile() has been rewritten to use splice framework.
>> > > The bug may be introduced while rewriting.
>> > >
>> > > BTW, it seems that thttpd does not use sendfile.
>> > >
>> > > > Thanks...
>> > > > Yo'av.
>> > > >
>> > > > On Fri, Jan 23, 2009 at 3:05 PM, Igor Sysoev <is at rambler-co.ru>
>> wrote:
>> > > >
>> > > > > On Fri, Jan 23, 2009 at 02:35:00PM +0200, Yo'av Moshe wrote:
>> > > > >
>> > > > > > Thanks Igor.
>> > > > > > I don't want to be rude, but I find it hard to believe that it's
>> a
>> > > kernel
>> > > > > > bug.
>> > > > > >
>> > > > > > First, I'm using the default CentOS 5.2 kernel, it's seems to me
>> > > pretty
>> > > > > odd
>> > > > > > that there's such a bug in it. CentOS is being used on a lot of
>> > > servers,
>> > > > > > some of them must be running nginx.
>> > > > > >
>> > > > > > Second, I tried serving the same file with thttpd, and it runs
>> just
>> > > fine.
>> > > > > > http://195.28.180.147:40/temp/hozer.pdf
>> > > > > > Note it's the exact same file which I cannot get from nginx:
>> > > > > > http://www.noal.org.il/static/temp/hozer.pdf
>> > > > > >
>> > > > > > How come thttpd can serve the file just fine? If it's a kernel
>> bug, I
>> > > > > > thought it would happen with all of my servers.
>> > > > > > If it's using some different way of sending the files which does
>> not
>> > > > > depend
>> > > > > > on that epoll call - is there any way to configure nginx to use
>> that
>> > > as
>> > > > > > well?
>> > > > >
>> > > > > I do not look modern thttpd, but according its change log, it has
>> no
>> > > > > epoll support. You may try different ways in nginx: you need to
>> build
>> > > it
>> > > > >
>> > > > >        --with-rtsig_module
>> > > > >        --with-select_module
>> > > > >        --with-poll_module
>> > > > >
>> > > > > and then choose method:
>> > > > >
>> > > > > events {
>> > > > >     use  select;
>> > > > >     #use  poll;
>> > > > >     #use  rtsig;
>> > > > > }
>> > > > >
>> > > > > Also, epoll has two modes: level triggered (default) and edge
>> > > > > triggered (EPOLLET). nginx uses more effective edge triggered
>> mode,
>> > > > > lighttpd uses level triggered one. The bug may be in ET mode only.
>> > > > >
>> > > > > As to probabilty of kernel bugs: I saw them in FreeBSD (kqueue,
>> > > sendfile),
>> > > > > Linux (epoll), Solaris (event ports), and MacOSX (kqueue,
>> sendfile).
>> > > > >
>> > > > > > Can I do anything to confirm that it's a kernel bug? If it is, I
>> > > should
>> > > > > > probably contact CentOS's maintainers.
>> > > > >
>> > > > > Your straces confirms this: nginx added socket to epoll and did
>> not
>> > > > > deleted it, nevertheless epoll does not send event.
>> > > > >
>> > > > > > Thank you, again.
>> > > > > > Yo'av.
>> > > > > >
>> > > > > > On Wed, Jan 21, 2009 at 7:26 AM, Igor Sysoev <is at rambler-co.ru>
>> > > wrote:
>> > > > > >
>> > > > > > > On Wed, Jan 21, 2009 at 01:52:53AM +0200, Yo'av Moshe wrote:
>> > > > > > >
>> > > > > > > > Igor, here's the new strace.
>> > > > > > > > http://pastebin.com/m5129c54b
>> > > > > > > >
>> > > > > > > > Does the fact it works in low download rates still makes you
>> > > think
>> > > > > it's a
>> > > > > > > > kernel issue?
>> > > > > > >
>> > > > > > > Yes, this is seems like a kernel issue. Kernel excatly said
>> (with
>> > > > > EAGAIN)
>> > > > > > > that there is no place to output data, but never returned a
>> new
>> > > > > > > notification
>> > > > > > > via epoll_wait() about free space:
>> > > > > > >
>> > > > > > > 21420 01:39:43 <... epoll_wait resumed> {{EPOLLOUT,
>> {u32=153137960,
>> > > > > > > u64=13811628390360331048}}}, 512, 60000) = 1
>> > > > > > > 21420 01:39:43 gettimeofday({1232494783, 413764}, NULL) = 0
>> > > > > > > 21420 01:39:43 sendfile64(20, 21, [1645625], 304605) = 40320
>> > > > > > > 21420 01:39:43 sendfile64(20, 21, [1685945], 264285) = -1
>> EAGAIN
>> > > > > (Resource
>> > > > > > > temporarily unavailable)
>> > > > > > > 21420 01:39:43 epoll_wait(11,  <unfinished ...>
>> > > > > > >
>> > > > > > > This is probably some race condition inside kernel, that does
>> not
>> > > > > appear on
>> > > > > > > low speed download.
>> > > > > > >
>> > > > > > > > Thank you, really!
>> > > > > > > > And thanks, Kiril and Jeff, for confirming that!
>> > > > > > > >
>> > > > > > > > Yo'av.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Tue, Jan 20, 2009 at 9:39 PM, Kiril Angov <
>> > > kupokomapa at gmail.com>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > wget http://www.noal.org.il/static/temp/barvazi2.pdf
>> > > > > > > > > --2009-01-20 21:36:00--
>> > > > > > > http://www.noal.org.il/static/temp/barvazi2.pdf
>> > > > > > > > > Resolving www.noal.org.il... 195.28.180.147
>> > > > > > > > > Connecting to www.noal.org.il|195.28.180.147|:80...
>> connected.
>> > > > > > > > > HTTP request sent, awaiting response... 200 OK
>> > > > > > > > > Length: 1950230 (1.9M) [application/pdf]
>> > > > > > > > > Saving to: `barvazi2.pdf'
>> > > > > > > > >
>> > > > > > > > > 81% [=====================================>         ]
>> 1,596,665
>> > > > > > > > > --.-K/s  eta 5s
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > wget -c http://www.noal.org.il/static/temp/barvazi2.pdf
>> > > > > > > > > --2009-01-20 21:38:28--
>> > > > > > > > > http://www.noal.org.il/static/temp/barvazi2.pdf
>> > > > > > > > > Resolving www.noal.org.il... 195.28.180.147
>> > > > > > > > > Connecting to www.noal.org.il|195.28.180.147|:80...
>> connected.
>> > > > > > > > > HTTP request sent, awaiting response... 206 Partial
>> Content
>> > > > > > > > > Length: 1950230 (1.9M), 305960 (299K) remaining
>> > > [application/pdf]
>> > > > > > > > > Saving to: `barvazi2.pdf'
>> > > > > > > > >
>> > > > > > > > > 100%[+++++++++++++++++++++++++++++++++++++++=======>]
>> 1,950,230
>> > > > > > > > > 295K/s   in 1.0s
>> > > > > > > > >
>> > > > > > > > > 2009-01-20 21:38:30 (295 KB/s) - `barvazi2.pdf' saved
>> > > > > [1950230/1950230]
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Tue, Jan 20, 2009 at 9:31 PM, jeff emminger <
>> > > > > jemminger at gmail.com>
>> > > > > > > > > wrote:
>> > > > > > > > > >> Could anyone confirm that this works:
>> > > > > > > > > >> wget --limit-rate=3500
>> > > > > > > http://www.noal.org.il/static/temp/barvazi2.pdf
>> > > > > > > > > >> while this doesn't:
>> > > > > > > > > >> wget http://www.noal.org.il/static/temp/barvazi2.pdf
>> > > > > > > > > >>
>> > > > > > > > > >
>> > > > > > > > > > Confirmed
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Yo'av Moshe
>> > > > > > >
>> > > > > > > --
>> > > > > > > Igor Sysoev
>> > > > > > > http://sysoev.ru/en/
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Yo'av Moshe
>> > > > >
>> > > > > --
>> > > > > Igor Sysoev
>> > > > > http://sysoev.ru/en/
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Yo'av Moshe
>> > >
>> > > --
>> > > Igor Sysoev
>> > > http://sysoev.ru/en/
>> > >
>> > >
>> >
>> >
>> > --
>> > Yo'av Moshe
>>
>> --
>> Igor Sysoev
>> http://sysoev.ru/en/
>>
>>
>
>
> --
> Yo'av Moshe
>



-- 
Yo'av Moshe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://nginx.org/pipermail/nginx/attachments/20090127/06a5d445/attachment.html>


More information about the nginx mailing list