Nginx stops sending file after ~1.5mb ?

Yo'av Moshe bjesus at gmail.com
Tue Jan 27 12:58:58 MSK 2009


Applied the patch, still the same.
The other only difference I can think about is that I'm running Cherokee on
a different port. Maybe my VPS is handling them differently.
I'll try testing it on the same port this evening and let you know.

Thank you.

Yo'av

On Tue, Jan 27, 2009 at 11:40 AM, Igor Sysoev <is at rambler-co.ru> wrote:

> On Tue, Jan 27, 2009 at 11:25:09AM +0200, Yo'av Moshe wrote:
>
> > Sure, here my Cherokee's strace:
> > http://pastebin.com/m13cd213d
> >
> > What you are saying about tcpdump's logs is strange, but I have no idea
> > what's causing it.
>
> I see only two difference with nginx:
>
> 1) it uses level triggered epoll (nginx uses edge triggered epoll),
> 2) and it uses TCP_NODELAY for first request (nginx set TCP_NODELAY
>   only just before going to keep-alive).
>
> The attached patch sets TCP_NODELAY from very start.
>
> > Yo'av.
> >
> > On Tue, Jan 27, 2009 at 10:51 AM, Igor Sysoev <is at rambler-co.ru> wrote:
> >
> > > On Tue, Jan 27, 2009 at 10:42:57AM +0200, Yo'av Moshe wrote:
> > >
> > > > Note I just compiled Cherokee 0.98. It is serving the file as it
> should
> > > > using sendfile and epoll.
> > >
> > > Have you tried this on the same host ?
> > > Could you post strace ?
> > >
> > > > On Mon, Jan 26, 2009 at 2:05 PM, Yo'av Moshe <bjesus at gmail.com>
> wrote:
> > > >
> > > > > Hey, attached is my tcpdump log.
> > > > > I've never used tcpdump before, so I hope the log is fine.
> > > > > I used grep to show only the connections from my computer while
> running
> > > > > 'wget http://www.noal.org.il/static/temp/hozer.pdf'.
> > > > >
> > > > > Hope this will lead us somewhere...
> > > > >
> > > > > Yo'av.
> > > > >
> > > > >
> > > > > On Sun, Jan 25, 2009 at 9:21 PM, Igor Sysoev <is at rambler-co.ru>
> wrote:
> > > > >
> > > > >> On Sun, Jan 25, 2009 at 09:04:54PM +0200, Yo'av Moshe wrote:
> > > > >>
> > > > >> > Sendfile is now off, using select, and still the same...
> > > > >> > http://pastebin.com/m2c3ea25b
> > > > >>
> > > > >> I have no idea, why it may be so.
> > > > >>
> > > > >> I see that select() returned event ("out [19]"):
> > > > >>
> > > > >> 28215 20:50:18 select(20, [10 19], [19], NULL, {0, 500000}
> <unfinished
> > > > >> ...>
> > > > >> 28215 20:50:18 <... select resumed> )   = 1 (out [19], left {0,
> > > 380000})
> > > > >>
> > > > >> However, at some stage select() did not return event, although the
> > > event
> > > > >> is still active ("select(20, [10 19], [19], NULL ..."):
> > > > >>
> > > > >> 28215 20:50:18 gettimeofday({1232909418, 293895}, NULL) = 0
> > > > >> 28215 20:50:18 writev(19,
> > > > >>
> > >
> [{"\227\1\2\1\3\1\4\1\5\0\373\0\374\1\230\1\231\1\232\1\233\0\375\0\376\1\6\1\7\1\10\0"...,
> > > > >> 9676}], 1) = 9676
> > > > >> 28215 20:50:18 pread64(20,
> > > > >>
> > > "\24\7\6#\"\'&\0214\22$63\1\6\25\24\26\27\02632654&#\"\6\3\226\204"...,
> > > > >> 32768, 1605632) = 32768
> > > > >> 28215 20:50:18 writev(19,
> > > > >>
> > >
> [{"\24\7\6#\"\'&\0214\22$63\1\6\25\24\26\27\02632654&#\"\6\3\226\204"...,
> > > > >> 32768}], 1) = 32768
> > > > >> 28215 20:50:18 pread64(20, "\0 \0M\0o\0r\0i\0s\0o\0n\0\'\0s\0
> > > > >> \0d\0i\0r\0e\0c"..., 32768, 1638400) = 32768
> > > > >> 28215 20:50:18 writev(19, [{"\0 \0M\0o\0r\0i\0s\0o\0n\0\'\0s\0
> > > > >> \0d\0i\0r\0e\0c"..., 32768}], 1) = 1716
> > > > >> 28215 20:50:18 select(20, [10 19], [19], NULL, {0, 500000}
> <unfinished
> > > > >> ...>
> > > > >> 28215 20:50:18 <... select resumed> )   = 0 (Timeout)
> > > > >>
> > > > >> Could you run tcpdump of the session ?
> > > > >>
> > > > >> > Yo'av
> > > > >> >
> > > > >> > On Sun, Jan 25, 2009 at 7:33 PM, Igor Sysoev <is at rambler-co.ru>
> > > wrote:
> > > > >> >
> > > > >> > > On Sun, Jan 25, 2009 at 06:26:58PM +0200, Yo'av Moshe wrote:
> > > > >> > >
> > > > >> > > > Igor, I tried using select instead of poll, but I get the
> exact
> > > same
> > > > >> > > > problem.
> > > > >> > > >
> > > > >> > > > Check my strace:
> > > > >> > > > http://pastebin.com/m65056ec0
> > > > >> > > >
> > > > >> > > > I get the same thing with rtsig too.
> > > > >> > > >
> > > > >> > > > Hints?
> > > > >> > > > Again, file is at
> http://www.noal.org.il/static/temp/hozer.pdf.
> > > > >> Server
> > > > >> > > is
> > > > >> > > > using select now.
> > > > >> > >
> > > > >> > > According strace select() does not return event for socket 19
> > > after
> > > > >> > > sendfile64() sent 1651005 bytes:
> > > > >> > >
> > > > >> > > 25951 18:14:26 select(20, [7 8 16 19], [19], NULL, {60, 0}
> > > <unfinished
> > > > >> ...>
> > > > >> > > 25951 18:14:27 <... select resumed> )   = 1 (out [19], left
> {58,
> > > > >> 790000})
> > > > >> > >
> > > > >> > > 25951 18:14:27 gettimeofday({1232900067, 890376}, NULL) = 0
> > > > >> > > 25951 18:14:27 sendfile64(19, 20, [1509465], 440765
> <unfinished
> > > ...>
> > > > >> > > 25951 18:14:27 <... sendfile64 resumed> ) = 141540
> > > > >> > > 25951 18:14:27 sendfile64(19, 20, [1651005], 299225
> <unfinished
> > > ...>
> > > > >> > > 25951 18:14:27 <... sendfile64 resumed> ) = -1 EAGAIN
> (Resource
> > > > >> temporarily
> > > > >> > > unavailable)
> > > > >> > >
> > > > >> > > 25951 18:14:27 select(20, [16 19], [19], NULL, {0, 500000}
> > > <unfinished
> > > > >> ...>
> > > > >> > > 25951 18:14:28 <... select resumed> )   = 0 (Timeout)
> > > > >> > >
> > > > >> > > As you have got the same issue with all methods (even with
> level
> > > > >> triggered
> > > > >> > > select()), I believe that the bug is in sendfile. You may try
> to
> > > turn
> > > > >> > > it off even for the single file:
> > > > >> > >
> > > > >> > >    location = /static/temp/hozer.pdf {
> > > > >> > >        sendfile off;
> > > > >> > >    }
> > > > >> > >
> > > > >> > > In Linux 2.6.23 sendfile() has been rewritten to use splice
> > > framework.
> > > > >> > > The bug may be introduced while rewriting.
> > > > >> > >
> > > > >> > > BTW, it seems that thttpd does not use sendfile.
> > > > >> > >
> > > > >> > > > Thanks...
> > > > >> > > > Yo'av.
> > > > >> > > >
> > > > >> > > > On Fri, Jan 23, 2009 at 3:05 PM, Igor Sysoev <
> is at rambler-co.ru>
> > > > >> wrote:
> > > > >> > > >
> > > > >> > > > > On Fri, Jan 23, 2009 at 02:35:00PM +0200, Yo'av Moshe
> wrote:
> > > > >> > > > >
> > > > >> > > > > > Thanks Igor.
> > > > >> > > > > > I don't want to be rude, but I find it hard to believe
> that
> > > it's
> > > > >> a
> > > > >> > > kernel
> > > > >> > > > > > bug.
> > > > >> > > > > >
> > > > >> > > > > > First, I'm using the default CentOS 5.2 kernel, it's
> seems
> > > to me
> > > > >> > > pretty
> > > > >> > > > > odd
> > > > >> > > > > > that there's such a bug in it. CentOS is being used on a
> lot
> > > of
> > > > >> > > servers,
> > > > >> > > > > > some of them must be running nginx.
> > > > >> > > > > >
> > > > >> > > > > > Second, I tried serving the same file with thttpd, and
> it
> > > runs
> > > > >> just
> > > > >> > > fine.
> > > > >> > > > > > http://195.28.180.147:40/temp/hozer.pdf
> > > > >> > > > > > Note it's the exact same file which I cannot get from
> nginx:
> > > > >> > > > > > http://www.noal.org.il/static/temp/hozer.pdf
> > > > >> > > > > >
> > > > >> > > > > > How come thttpd can serve the file just fine? If it's a
> > > kernel
> > > > >> bug, I
> > > > >> > > > > > thought it would happen with all of my servers.
> > > > >> > > > > > If it's using some different way of sending the files
> which
> > > does
> > > > >> not
> > > > >> > > > > depend
> > > > >> > > > > > on that epoll call - is there any way to configure nginx
> to
> > > use
> > > > >> that
> > > > >> > > as
> > > > >> > > > > > well?
> > > > >> > > > >
> > > > >> > > > > I do not look modern thttpd, but according its change log,
> it
> > > has
> > > > >> no
> > > > >> > > > > epoll support. You may try different ways in nginx: you
> need
> > > to
> > > > >> build
> > > > >> > > it
> > > > >> > > > >
> > > > >> > > > >        --with-rtsig_module
> > > > >> > > > >        --with-select_module
> > > > >> > > > >        --with-poll_module
> > > > >> > > > >
> > > > >> > > > > and then choose method:
> > > > >> > > > >
> > > > >> > > > > events {
> > > > >> > > > >     use  select;
> > > > >> > > > >     #use  poll;
> > > > >> > > > >     #use  rtsig;
> > > > >> > > > > }
> > > > >> > > > >
> > > > >> > > > > Also, epoll has two modes: level triggered (default) and
> edge
> > > > >> > > > > triggered (EPOLLET). nginx uses more effective edge
> triggered
> > > > >> mode,
> > > > >> > > > > lighttpd uses level triggered one. The bug may be in ET
> mode
> > > only.
> > > > >> > > > >
> > > > >> > > > > As to probabilty of kernel bugs: I saw them in FreeBSD
> > > (kqueue,
> > > > >> > > sendfile),
> > > > >> > > > > Linux (epoll), Solaris (event ports), and MacOSX (kqueue,
> > > > >> sendfile).
> > > > >> > > > >
> > > > >> > > > > > Can I do anything to confirm that it's a kernel bug? If
> it
> > > is, I
> > > > >> > > should
> > > > >> > > > > > probably contact CentOS's maintainers.
> > > > >> > > > >
> > > > >> > > > > Your straces confirms this: nginx added socket to epoll
> and
> > > did
> > > > >> not
> > > > >> > > > > deleted it, nevertheless epoll does not send event.
> > > > >> > > > >
> > > > >> > > > > > Thank you, again.
> > > > >> > > > > > Yo'av.
> > > > >> > > > > >
> > > > >> > > > > > On Wed, Jan 21, 2009 at 7:26 AM, Igor Sysoev <
> > > is at rambler-co.ru>
> > > > >> > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > On Wed, Jan 21, 2009 at 01:52:53AM +0200, Yo'av Moshe
> > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Igor, here's the new strace.
> > > > >> > > > > > > > http://pastebin.com/m5129c54b
> > > > >> > > > > > > >
> > > > >> > > > > > > > Does the fact it works in low download rates still
> makes
> > > you
> > > > >> > > think
> > > > >> > > > > it's a
> > > > >> > > > > > > > kernel issue?
> > > > >> > > > > > >
> > > > >> > > > > > > Yes, this is seems like a kernel issue. Kernel excatly
> > > said
> > > > >> (with
> > > > >> > > > > EAGAIN)
> > > > >> > > > > > > that there is no place to output data, but never
> returned
> > > a
> > > > >> new
> > > > >> > > > > > > notification
> > > > >> > > > > > > via epoll_wait() about free space:
> > > > >> > > > > > >
> > > > >> > > > > > > 21420 01:39:43 <... epoll_wait resumed> {{EPOLLOUT,
> > > > >> {u32=153137960,
> > > > >> > > > > > > u64=13811628390360331048}}}, 512, 60000) = 1
> > > > >> > > > > > > 21420 01:39:43 gettimeofday({1232494783, 413764},
> NULL) =
> > > 0
> > > > >> > > > > > > 21420 01:39:43 sendfile64(20, 21, [1645625], 304605) =
> > > 40320
> > > > >> > > > > > > 21420 01:39:43 sendfile64(20, 21, [1685945], 264285) =
> -1
> > > > >> EAGAIN
> > > > >> > > > > (Resource
> > > > >> > > > > > > temporarily unavailable)
> > > > >> > > > > > > 21420 01:39:43 epoll_wait(11,  <unfinished ...>
> > > > >> > > > > > >
> > > > >> > > > > > > This is probably some race condition inside kernel,
> that
> > > does
> > > > >> not
> > > > >> > > > > appear on
> > > > >> > > > > > > low speed download.
> > > > >> > > > > > >
> > > > >> > > > > > > > Thank you, really!
> > > > >> > > > > > > > And thanks, Kiril and Jeff, for confirming that!
> > > > >> > > > > > > >
> > > > >> > > > > > > > Yo'av.
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Tue, Jan 20, 2009 at 9:39 PM, Kiril Angov <
> > > > >> > > kupokomapa at gmail.com>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > wget
> http://www.noal.org.il/static/temp/barvazi2.pdf
> > > > >> > > > > > > > > --2009-01-20 21:36:00--
> > > > >> > > > > > > http://www.noal.org.il/static/temp/barvazi2.pdf
> > > > >> > > > > > > > > Resolving www.noal.org.il... 195.28.180.147
> > > > >> > > > > > > > > Connecting to www.noal.org.il
> |195.28.180.147|:80...
> > > > >> connected.
> > > > >> > > > > > > > > HTTP request sent, awaiting response... 200 OK
> > > > >> > > > > > > > > Length: 1950230 (1.9M) [application/pdf]
> > > > >> > > > > > > > > Saving to: `barvazi2.pdf'
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 81% [=====================================>
>   ]
> > > > >> 1,596,665
> > > > >> > > > > > > > > --.-K/s  eta 5s
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > wget -c
> > > http://www.noal.org.il/static/temp/barvazi2.pdf
> > > > >> > > > > > > > > --2009-01-20 21:38:28--
> > > > >> > > > > > > > > http://www.noal.org.il/static/temp/barvazi2.pdf
> > > > >> > > > > > > > > Resolving www.noal.org.il... 195.28.180.147
> > > > >> > > > > > > > > Connecting to www.noal.org.il
> |195.28.180.147|:80...
> > > > >> connected.
> > > > >> > > > > > > > > HTTP request sent, awaiting response... 206
> Partial
> > > > >> Content
> > > > >> > > > > > > > > Length: 1950230 (1.9M), 305960 (299K) remaining
> > > > >> > > [application/pdf]
> > > > >> > > > > > > > > Saving to: `barvazi2.pdf'
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> 100%[+++++++++++++++++++++++++++++++++++++++=======>]
> > > > >> 1,950,230
> > > > >> > > > > > > > > 295K/s   in 1.0s
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 2009-01-20 21:38:30 (295 KB/s) - `barvazi2.pdf'
> saved
> > > > >> > > > > [1950230/1950230]
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > On Tue, Jan 20, 2009 at 9:31 PM, jeff emminger <
> > > > >> > > > > jemminger at gmail.com>
> > > > >> > > > > > > > > wrote:
> > > > >> > > > > > > > > >> Could anyone confirm that this works:
> > > > >> > > > > > > > > >> wget --limit-rate=3500
> > > > >> > > > > > > http://www.noal.org.il/static/temp/barvazi2.pdf
> > > > >> > > > > > > > > >> while this doesn't:
> > > > >> > > > > > > > > >> wget
> > > http://www.noal.org.il/static/temp/barvazi2.pdf
> > > > >> > > > > > > > > >>
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Confirmed
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > --
> > > > >> > > > > > > > Yo'av Moshe
> > > > >> > > > > > >
> > > > >> > > > > > > --
> > > > >> > > > > > > Igor Sysoev
> > > > >> > > > > > > http://sysoev.ru/en/
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > --
> > > > >> > > > > > Yo'av Moshe
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Igor Sysoev
> > > > >> > > > > http://sysoev.ru/en/
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > > Yo'av Moshe
> > > > >> > >
> > > > >> > > --
> > > > >> > > Igor Sysoev
> > > > >> > > http://sysoev.ru/en/
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Yo'av Moshe
> > > > >>
> > > > >> --
> > > > >> Igor Sysoev
> > > > >> http://sysoev.ru/en/
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Yo'av Moshe
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Yo'av Moshe
> > >
> > > --
> > > Igor Sysoev
> > > http://sysoev.ru/en/
> > >
> > >
> >
> >
> > --
> > Yo'av Moshe
>
> --
> Igor Sysoev
> http://sysoev.ru/en/
>



-- 
Yo'av Moshe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://nginx.org/pipermail/nginx/attachments/20090127/6de11da4/attachment.html>


More information about the nginx mailing list