upstream timeouts I can not explain

Руслан Закиров ruz at sports.ru
Wed Jan 11 17:54:31 UTC 2017


On Wed, Jan 11, 2017 at 6:44 PM, Maxim Dounin <mdounin at mdounin.ru> wrote:

> Hello!
>
> On Wed, Jan 11, 2017 at 06:07:01PM +0300, Руслан Закиров wrote:
>
> > On Mon, Jan 9, 2017 at 10:59 PM, Maxim Dounin <mdounin at mdounin.ru>
> wrote:
> >
> > > Typical kern.sched.quantum is about 100ms, so several
> > > CPU-intensive tasks can delay processing of the events enough to
> > > trigger a timeout if a context switch happens at a bad time.
> > >
> >
> > Here what I see in truss' output:
> >
> > 38.820523207 0.000006568 kevent(28,{ },0,{
> > 198,EVFILT_WRITE,EV_CLEAR,0x0,0x8218,0x821405071 },512,{ 6.215000000 })
> = 1
> > (0x1)
> > 39.783094188 0.000022875 kevent(28,{ },0,{
> > 52,EVFILT_READ,0x0,0x0,0x30b,0x81f800068
> > 204,EVFILT_WRITE,EV_CLEAR,0x0,0x8218,0x821401588
> > 51,EVFILT_READ,0x0,0x0,0xec,0x81f800000
> > 68,EVFILT_READ,EV_CLEAR,0x0,0x8bf,0x81f816580
> > 7,EVFILT_READ,EV_CLEAR,0x0,0x27f,0x81f813869
> > 57,EVFILT_READ,EV_CLEAR,0x0,0x767,0x81f817bd8
> > 203,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0x248,0x81f8030c1
> > 181,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0x9b77,0x81f80ea68
> > 178,EVFILT_READ,EV_CLEAR,0x0,0x39d,0x81f8010a9
> > 198,EVFILT_READ,EV_CLEAR,0x0,0x3d3,0x81f805071
> > 204,EVFILT_READ,EV_CLEAR,0x0,0x9da,0x81f801588
> > 190,EVFILT_READ,EV_CLEAR,0x0,0x4ff,0x81f80fc48
> > 154,EVFILT_READ,EV_CLEAR,0x0,0x88e,0x81f8130b1
> > 151,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0xc1db,0x81f814290
> > 157,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0xe841,0x81f80c029
> > 195,EVFILT_READ,EV_CLEAR,0x0,0x952,0x81f8090a1
> > 194,EVFILT_READ,EV_CLEAR,0x0,0x929,0x81f809ac8
> > 201,EVFILT_READ,EV_CLEAR,0x0,0x4ef,0x81f80c980
> > 174,EVFILT_READ,EV_CLEAR,0x0,0x51e,0x81f816518
> > 77,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0x1168,0x81f811c61 },512,{
> 5.253000000
> > }) = 20 (0x14)
> >
> >
> > 1 second delay between two syscalls. Then nginx goes nuts processing all
> it
> > missed during this second. I can not tell from this output how much time
> > was spent in these syscalls. Can anyone?
>
> Using ktrace / kdump might be a better option, it shows both
> syscall enter and syscall return with exact timestamps.
>

Tried. Found at least one issue with sendfile blocking:

 66193 nginx    1484152162.182601 CALL
 openat(AT_FDCWD,0x80639aac4,0x4<O_NONBLOCK>,<unused>0)
 66193 nginx    1484152162.182607 NAMI  ...
 66193 nginx    1484152162.182643 RET   openat 40/0x28
... gz lookup and stats ...
 66193 nginx    1484152162.182683 CALL
 setsockopt(0x211,0x6,0x4,0x7fffffffd96c,0x4)
 66193 nginx    1484152162.182687 RET   setsockopt 0
 66193 nginx    1484152162.182689 CALL
 sendfile(0x28,0x211,0,0x806f,0x7fffffffe1d8,0x7fffffffe240,<invalid=0>)
 66193 nginx    1484152163.541770 RET   sendfile 0

Sendfile blocks for 1.3 seconds. However, it's:

$ sysctl hw.model hw.machine hw.ncpu
hw.model: Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
hw.machine: amd64
hw.ncpu: 16

So sfbufs don't apply here, but documentation doesn't tell what resource
applies... or I couldn't find correct doc.

> What I don't like is timeout greater than 5 seconds. Doesn't it mean that
> > system is allowed to block for timeout time to collect events?
>
> Timeout as passed to kevent() is the nearest timer set in nginx.
> That is, nginx has nothing to do till the time specified, and
> allows kernel to block that long if there are no events.  Kernel
> is expected to return events as soon as it has any, or return 0 if
> the specified time limit expires.
>

This I figured from reading nginx source code.

--
> Maxim Dounin
> http://nginx.org/
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
>



-- 
Руслан Закиров
Руководитель отдела разработки веб-сервисов
+7(916) 597-92-69, ruz @  <http://www.sports.ru/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20170111/03062dbf/attachment.html>


More information about the nginx mailing list