Nginx stops sending file after ~1.5mb ?

Igor Sysoev is at rambler-co.ru
Tue Jan 27 14:17:29 MSK 2009


On Tue, Jan 27, 2009 at 12:24:50PM +0200, Yo'av Moshe wrote:

> Right, it's working on the other port.
> I'll check it again with my VPS.
> 
> Sorry for all that.

OK. Please, when issue will be resolved, post the cause in the list.

> Yo'av.
> 
> On Tue, Jan 27, 2009 at 12:05 PM, Igor Sysoev <is at rambler-co.ru> wrote:
> 
> > On Tue, Jan 27, 2009 at 11:58:58AM +0200, Yo'av Moshe wrote:
> >
> > > Applied the patch, still the same.
> > > The other only difference I can think about is that I'm running Cherokee
> > on
> > > a different port. Maybe my VPS is handling them differently.
> > > I'll try testing it on the same port this evening and let you know.
> >
> > Could you run another nginx on this port ?
> > Or even the same nginx:
> >
> >        server {
> >            listen  80;
> >            listen  PORT;
> >
> >            ...
> >         }
> >
> > > Thank you.
> > >
> > > Yo'av
> > >
> > > On Tue, Jan 27, 2009 at 11:40 AM, Igor Sysoev <is at rambler-co.ru> wrote:
> > >
> > > > On Tue, Jan 27, 2009 at 11:25:09AM +0200, Yo'av Moshe wrote:
> > > >
> > > > > Sure, here my Cherokee's strace:
> > > > > http://pastebin.com/m13cd213d
> > > > >
> > > > > What you are saying about tcpdump's logs is strange, but I have no
> > idea
> > > > > what's causing it.
> > > >
> > > > I see only two difference with nginx:
> > > >
> > > > 1) it uses level triggered epoll (nginx uses edge triggered epoll),
> > > > 2) and it uses TCP_NODELAY for first request (nginx set TCP_NODELAY
> > > >   only just before going to keep-alive).
> > > >
> > > > The attached patch sets TCP_NODELAY from very start.
> > > >
> > > > > Yo'av.
> > > > >
> > > > > On Tue, Jan 27, 2009 at 10:51 AM, Igor Sysoev <is at rambler-co.ru>
> > wrote:
> > > > >
> > > > > > On Tue, Jan 27, 2009 at 10:42:57AM +0200, Yo'av Moshe wrote:
> > > > > >
> > > > > > > Note I just compiled Cherokee 0.98. It is serving the file as it
> > > > should
> > > > > > > using sendfile and epoll.
> > > > > >
> > > > > > Have you tried this on the same host ?
> > > > > > Could you post strace ?
> > > > > >
> > > > > > > On Mon, Jan 26, 2009 at 2:05 PM, Yo'av Moshe <bjesus at gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Hey, attached is my tcpdump log.
> > > > > > > > I've never used tcpdump before, so I hope the log is fine.
> > > > > > > > I used grep to show only the connections from my computer while
> > > > running
> > > > > > > > 'wget http://www.noal.org.il/static/temp/hozer.pdf'.
> > > > > > > >
> > > > > > > > Hope this will lead us somewhere...
> > > > > > > >
> > > > > > > > Yo'av.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, Jan 25, 2009 at 9:21 PM, Igor Sysoev <is at rambler-co.ru
> > >
> > > > wrote:
> > > > > > > >
> > > > > > > >> On Sun, Jan 25, 2009 at 09:04:54PM +0200, Yo'av Moshe wrote:
> > > > > > > >>
> > > > > > > >> > Sendfile is now off, using select, and still the same...
> > > > > > > >> > http://pastebin.com/m2c3ea25b
> > > > > > > >>
> > > > > > > >> I have no idea, why it may be so.
> > > > > > > >>
> > > > > > > >> I see that select() returned event ("out [19]"):
> > > > > > > >>
> > > > > > > >> 28215 20:50:18 select(20, [10 19], [19], NULL, {0, 500000}
> > > > <unfinished
> > > > > > > >> ...>
> > > > > > > >> 28215 20:50:18 <... select resumed> )   = 1 (out [19], left
> > {0,
> > > > > > 380000})
> > > > > > > >>
> > > > > > > >> However, at some stage select() did not return event, although
> > the
> > > > > > event
> > > > > > > >> is still active ("select(20, [10 19], [19], NULL ..."):
> > > > > > > >>
> > > > > > > >> 28215 20:50:18 gettimeofday({1232909418, 293895}, NULL) = 0
> > > > > > > >> 28215 20:50:18 writev(19,
> > > > > > > >>
> > > > > >
> > > >
> > [{"\227\1\2\1\3\1\4\1\5\0\373\0\374\1\230\1\231\1\232\1\233\0\375\0\376\1\6\1\7\1\10\0"...,
> > > > > > > >> 9676}], 1) = 9676
> > > > > > > >> 28215 20:50:18 pread64(20,
> > > > > > > >>
> > > > > >
> > "\24\7\6#\"\'&\0214\22$63\1\6\25\24\26\27\02632654&#\"\6\3\226\204"...,
> > > > > > > >> 32768, 1605632) = 32768
> > > > > > > >> 28215 20:50:18 writev(19,
> > > > > > > >>
> > > > > >
> > > >
> > [{"\24\7\6#\"\'&\0214\22$63\1\6\25\24\26\27\02632654&#\"\6\3\226\204"...,
> > > > > > > >> 32768}], 1) = 32768
> > > > > > > >> 28215 20:50:18 pread64(20, "\0 \0M\0o\0r\0i\0s\0o\0n\0\'\0s\0
> > > > > > > >> \0d\0i\0r\0e\0c"..., 32768, 1638400) = 32768
> > > > > > > >> 28215 20:50:18 writev(19, [{"\0 \0M\0o\0r\0i\0s\0o\0n\0\'\0s\0
> > > > > > > >> \0d\0i\0r\0e\0c"..., 32768}], 1) = 1716
> > > > > > > >> 28215 20:50:18 select(20, [10 19], [19], NULL, {0, 500000}
> > > > <unfinished
> > > > > > > >> ...>
> > > > > > > >> 28215 20:50:18 <... select resumed> )   = 0 (Timeout)
> > > > > > > >>
> > > > > > > >> Could you run tcpdump of the session ?
> > > > > > > >>
> > > > > > > >> > Yo'av
> > > > > > > >> >
> > > > > > > >> > On Sun, Jan 25, 2009 at 7:33 PM, Igor Sysoev <
> > is at rambler-co.ru>
> > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > On Sun, Jan 25, 2009 at 06:26:58PM +0200, Yo'av Moshe
> > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Igor, I tried using select instead of poll, but I get
> > the
> > > > exact
> > > > > > same
> > > > > > > >> > > > problem.
> > > > > > > >> > > >
> > > > > > > >> > > > Check my strace:
> > > > > > > >> > > > http://pastebin.com/m65056ec0
> > > > > > > >> > > >
> > > > > > > >> > > > I get the same thing with rtsig too.
> > > > > > > >> > > >
> > > > > > > >> > > > Hints?
> > > > > > > >> > > > Again, file is at
> > > > http://www.noal.org.il/static/temp/hozer.pdf.
> > > > > > > >> Server
> > > > > > > >> > > is
> > > > > > > >> > > > using select now.
> > > > > > > >> > >
> > > > > > > >> > > According strace select() does not return event for socket
> > 19
> > > > > > after
> > > > > > > >> > > sendfile64() sent 1651005 bytes:
> > > > > > > >> > >
> > > > > > > >> > > 25951 18:14:26 select(20, [7 8 16 19], [19], NULL, {60, 0}
> > > > > > <unfinished
> > > > > > > >> ...>
> > > > > > > >> > > 25951 18:14:27 <... select resumed> )   = 1 (out [19],
> > left
> > > > {58,
> > > > > > > >> 790000})
> > > > > > > >> > >
> > > > > > > >> > > 25951 18:14:27 gettimeofday({1232900067, 890376}, NULL) =
> > 0
> > > > > > > >> > > 25951 18:14:27 sendfile64(19, 20, [1509465], 440765
> > > > <unfinished
> > > > > > ...>
> > > > > > > >> > > 25951 18:14:27 <... sendfile64 resumed> ) = 141540
> > > > > > > >> > > 25951 18:14:27 sendfile64(19, 20, [1651005], 299225
> > > > <unfinished
> > > > > > ...>
> > > > > > > >> > > 25951 18:14:27 <... sendfile64 resumed> ) = -1 EAGAIN
> > > > (Resource
> > > > > > > >> temporarily
> > > > > > > >> > > unavailable)
> > > > > > > >> > >
> > > > > > > >> > > 25951 18:14:27 select(20, [16 19], [19], NULL, {0, 500000}
> > > > > > <unfinished
> > > > > > > >> ...>
> > > > > > > >> > > 25951 18:14:28 <... select resumed> )   = 0 (Timeout)
> > > > > > > >> > >
> > > > > > > >> > > As you have got the same issue with all methods (even with
> > > > level
> > > > > > > >> triggered
> > > > > > > >> > > select()), I believe that the bug is in sendfile. You may
> > try
> > > > to
> > > > > > turn
> > > > > > > >> > > it off even for the single file:
> > > > > > > >> > >
> > > > > > > >> > >    location = /static/temp/hozer.pdf {
> > > > > > > >> > >        sendfile off;
> > > > > > > >> > >    }
> > > > > > > >> > >
> > > > > > > >> > > In Linux 2.6.23 sendfile() has been rewritten to use
> > splice
> > > > > > framework.
> > > > > > > >> > > The bug may be introduced while rewriting.
> > > > > > > >> > >
> > > > > > > >> > > BTW, it seems that thttpd does not use sendfile.
> > > > > > > >> > >
> > > > > > > >> > > > Thanks...
> > > > > > > >> > > > Yo'av.
> > > > > > > >> > > >
> > > > > > > >> > > > On Fri, Jan 23, 2009 at 3:05 PM, Igor Sysoev <
> > > > is at rambler-co.ru>
> > > > > > > >> wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > On Fri, Jan 23, 2009 at 02:35:00PM +0200, Yo'av Moshe
> > > > wrote:
> > > > > > > >> > > > >
> > > > > > > >> > > > > > Thanks Igor.
> > > > > > > >> > > > > > I don't want to be rude, but I find it hard to
> > believe
> > > > that
> > > > > > it's
> > > > > > > >> a
> > > > > > > >> > > kernel
> > > > > > > >> > > > > > bug.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > First, I'm using the default CentOS 5.2 kernel, it's
> > > > seems
> > > > > > to me
> > > > > > > >> > > pretty
> > > > > > > >> > > > > odd
> > > > > > > >> > > > > > that there's such a bug in it. CentOS is being used
> > on a
> > > > lot
> > > > > > of
> > > > > > > >> > > servers,
> > > > > > > >> > > > > > some of them must be running nginx.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Second, I tried serving the same file with thttpd,
> > and
> > > > it
> > > > > > runs
> > > > > > > >> just
> > > > > > > >> > > fine.
> > > > > > > >> > > > > > http://195.28.180.147:40/temp/hozer.pdf
> > > > > > > >> > > > > > Note it's the exact same file which I cannot get
> > from
> > > > nginx:
> > > > > > > >> > > > > > http://www.noal.org.il/static/temp/hozer.pdf
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > How come thttpd can serve the file just fine? If
> > it's a
> > > > > > kernel
> > > > > > > >> bug, I
> > > > > > > >> > > > > > thought it would happen with all of my servers.
> > > > > > > >> > > > > > If it's using some different way of sending the
> > files
> > > > which
> > > > > > does
> > > > > > > >> not
> > > > > > > >> > > > > depend
> > > > > > > >> > > > > > on that epoll call - is there any way to configure
> > nginx
> > > > to
> > > > > > use
> > > > > > > >> that
> > > > > > > >> > > as
> > > > > > > >> > > > > > well?
> > > > > > > >> > > > >
> > > > > > > >> > > > > I do not look modern thttpd, but according its change
> > log,
> > > > it
> > > > > > has
> > > > > > > >> no
> > > > > > > >> > > > > epoll support. You may try different ways in nginx:
> > you
> > > > need
> > > > > > to
> > > > > > > >> build
> > > > > > > >> > > it
> > > > > > > >> > > > >
> > > > > > > >> > > > >        --with-rtsig_module
> > > > > > > >> > > > >        --with-select_module
> > > > > > > >> > > > >        --with-poll_module
> > > > > > > >> > > > >
> > > > > > > >> > > > > and then choose method:
> > > > > > > >> > > > >
> > > > > > > >> > > > > events {
> > > > > > > >> > > > >     use  select;
> > > > > > > >> > > > >     #use  poll;
> > > > > > > >> > > > >     #use  rtsig;
> > > > > > > >> > > > > }
> > > > > > > >> > > > >
> > > > > > > >> > > > > Also, epoll has two modes: level triggered (default)
> > and
> > > > edge
> > > > > > > >> > > > > triggered (EPOLLET). nginx uses more effective edge
> > > > triggered
> > > > > > > >> mode,
> > > > > > > >> > > > > lighttpd uses level triggered one. The bug may be in
> > ET
> > > > mode
> > > > > > only.
> > > > > > > >> > > > >
> > > > > > > >> > > > > As to probabilty of kernel bugs: I saw them in FreeBSD
> > > > > > (kqueue,
> > > > > > > >> > > sendfile),
> > > > > > > >> > > > > Linux (epoll), Solaris (event ports), and MacOSX
> > (kqueue,
> > > > > > > >> sendfile).
> > > > > > > >> > > > >
> > > > > > > >> > > > > > Can I do anything to confirm that it's a kernel bug?
> > If
> > > > it
> > > > > > is, I
> > > > > > > >> > > should
> > > > > > > >> > > > > > probably contact CentOS's maintainers.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Your straces confirms this: nginx added socket to
> > epoll
> > > > and
> > > > > > did
> > > > > > > >> not
> > > > > > > >> > > > > deleted it, nevertheless epoll does not send event.
> > > > > > > >> > > > >
> > > > > > > >> > > > > > Thank you, again.
> > > > > > > >> > > > > > Yo'av.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > On Wed, Jan 21, 2009 at 7:26 AM, Igor Sysoev <
> > > > > > is at rambler-co.ru>
> > > > > > > >> > > wrote:
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > > On Wed, Jan 21, 2009 at 01:52:53AM +0200, Yo'av
> > Moshe
> > > > > > wrote:
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > > Igor, here's the new strace.
> > > > > > > >> > > > > > > > http://pastebin.com/m5129c54b
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Does the fact it works in low download rates
> > still
> > > > makes
> > > > > > you
> > > > > > > >> > > think
> > > > > > > >> > > > > it's a
> > > > > > > >> > > > > > > > kernel issue?
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > Yes, this is seems like a kernel issue. Kernel
> > excatly
> > > > > > said
> > > > > > > >> (with
> > > > > > > >> > > > > EAGAIN)
> > > > > > > >> > > > > > > that there is no place to output data, but never
> > > > returned
> > > > > > a
> > > > > > > >> new
> > > > > > > >> > > > > > > notification
> > > > > > > >> > > > > > > via epoll_wait() about free space:
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > 21420 01:39:43 <... epoll_wait resumed>
> > {{EPOLLOUT,
> > > > > > > >> {u32=153137960,
> > > > > > > >> > > > > > > u64=13811628390360331048}}}, 512, 60000) = 1
> > > > > > > >> > > > > > > 21420 01:39:43 gettimeofday({1232494783, 413764},
> > > > NULL) =
> > > > > > 0
> > > > > > > >> > > > > > > 21420 01:39:43 sendfile64(20, 21, [1645625],
> > 304605) =
> > > > > > 40320
> > > > > > > >> > > > > > > 21420 01:39:43 sendfile64(20, 21, [1685945],
> > 264285) =
> > > > -1
> > > > > > > >> EAGAIN
> > > > > > > >> > > > > (Resource
> > > > > > > >> > > > > > > temporarily unavailable)
> > > > > > > >> > > > > > > 21420 01:39:43 epoll_wait(11,  <unfinished ...>
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > This is probably some race condition inside
> > kernel,
> > > > that
> > > > > > does
> > > > > > > >> not
> > > > > > > >> > > > > appear on
> > > > > > > >> > > > > > > low speed download.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > > Thank you, really!
> > > > > > > >> > > > > > > > And thanks, Kiril and Jeff, for confirming that!
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Yo'av.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > On Tue, Jan 20, 2009 at 9:39 PM, Kiril Angov <
> > > > > > > >> > > kupokomapa at gmail.com>
> > > > > > > >> > > > > > > wrote:
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > > wget
> > > > http://www.noal.org.il/static/temp/barvazi2.pdf
> > > > > > > >> > > > > > > > > --2009-01-20 21:36:00--
> > > > > > > >> > > > > > > http://www.noal.org.il/static/temp/barvazi2.pdf
> > > > > > > >> > > > > > > > > Resolving www.noal.org.il... 195.28.180.147
> > > > > > > >> > > > > > > > > Connecting to www.noal.org.il
> > > > |195.28.180.147|:80...
> > > > > > > >> connected.
> > > > > > > >> > > > > > > > > HTTP request sent, awaiting response... 200 OK
> > > > > > > >> > > > > > > > > Length: 1950230 (1.9M) [application/pdf]
> > > > > > > >> > > > > > > > > Saving to: `barvazi2.pdf'
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > 81% [=====================================>
> > > >   ]
> > > > > > > >> 1,596,665
> > > > > > > >> > > > > > > > > --.-K/s  eta 5s
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > wget -c
> > > > > > http://www.noal.org.il/static/temp/barvazi2.pdf
> > > > > > > >> > > > > > > > > --2009-01-20 21:38:28--
> > > > > > > >> > > > > > > > >
> > http://www.noal.org.il/static/temp/barvazi2.pdf
> > > > > > > >> > > > > > > > > Resolving www.noal.org.il... 195.28.180.147
> > > > > > > >> > > > > > > > > Connecting to www.noal.org.il
> > > > |195.28.180.147|:80...
> > > > > > > >> connected.
> > > > > > > >> > > > > > > > > HTTP request sent, awaiting response... 206
> > > > Partial
> > > > > > > >> Content
> > > > > > > >> > > > > > > > > Length: 1950230 (1.9M), 305960 (299K)
> > remaining
> > > > > > > >> > > [application/pdf]
> > > > > > > >> > > > > > > > > Saving to: `barvazi2.pdf'
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > 100%[+++++++++++++++++++++++++++++++++++++++=======>]
> > > > > > > >> 1,950,230
> > > > > > > >> > > > > > > > > 295K/s   in 1.0s
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > 2009-01-20 21:38:30 (295 KB/s) -
> > `barvazi2.pdf'
> > > > saved
> > > > > > > >> > > > > [1950230/1950230]
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > On Tue, Jan 20, 2009 at 9:31 PM, jeff emminger
> > <
> > > > > > > >> > > > > jemminger at gmail.com>
> > > > > > > >> > > > > > > > > wrote:
> > > > > > > >> > > > > > > > > >> Could anyone confirm that this works:
> > > > > > > >> > > > > > > > > >> wget --limit-rate=3500
> > > > > > > >> > > > > > > http://www.noal.org.il/static/temp/barvazi2.pdf
> > > > > > > >> > > > > > > > > >> while this doesn't:
> > > > > > > >> > > > > > > > > >> wget
> > > > > > http://www.noal.org.il/static/temp/barvazi2.pdf
> > > > > > > >> > > > > > > > > >>
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > Confirmed
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > --
> > > > > > > >> > > > > > > > Yo'av Moshe
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > --
> > > > > > > >> > > > > > > Igor Sysoev
> > > > > > > >> > > > > > > http://sysoev.ru/en/
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > --
> > > > > > > >> > > > > > Yo'av Moshe
> > > > > > > >> > > > >
> > > > > > > >> > > > > --
> > > > > > > >> > > > > Igor Sysoev
> > > > > > > >> > > > > http://sysoev.ru/en/
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > > --
> > > > > > > >> > > > Yo'av Moshe
> > > > > > > >> > >
> > > > > > > >> > > --
> > > > > > > >> > > Igor Sysoev
> > > > > > > >> > > http://sysoev.ru/en/
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > --
> > > > > > > >> > Yo'av Moshe
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Igor Sysoev
> > > > > > > >> http://sysoev.ru/en/
> > > > > > > >>
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Yo'av Moshe
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Yo'av Moshe
> > > > > >
> > > > > > --
> > > > > > Igor Sysoev
> > > > > > http://sysoev.ru/en/
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Yo'av Moshe
> > > >
> > > > --
> > > > Igor Sysoev
> > > > http://sysoev.ru/en/
> > > >
> > >
> > >
> > >
> > > --
> > > Yo'av Moshe
> >
> > --
> > Igor Sysoev
> > http://sysoev.ru/en/
> >
> >
> 
> 
> -- 
> Yo'av Moshe

-- 
Igor Sysoev
http://sysoev.ru/en/





More information about the nginx mailing list