One upstream connection blocks another upstream connection
Maxim Dounin
mdounin at mdounin.ru
Tue Mar 13 17:01:23 UTC 2018
Hello!
On Tue, Mar 13, 2018 at 12:39:05PM +0000, lje at napatech.com wrote:
> IБ─≥m investing a problem where I get delays in requests that is
> proxied to an upstream service.
> By looking at the debug log it seems that one request is
> blocking the worker so it canБ─≥t complete another requests.
>
> If you look at the log below request *352 is handled by worker
> 18728. This worker then starts to process request *360 but for
> reason it is blocked at 13:53:53 until 13:54:03. How can that
> happen ?
>
> The upstream service for request *360 doesnБ─≥t send any data in
> the time interval 13:53:53-15:54:03.
> But the upstream service for request *352 responses nearly
> immediately.
> (I have examined the communication between nginx and the
> upstream in wireshark)
>
> Another observation, in the time interval 13:53:53-15:54:03 the
> worker process seems to be in state D (uninterruptible sleep)
>
> So my question is: What can block worker 18728 so it doesnБ─≥t
> complete request *352
>
> OS: Redhat 7.4
> Nginx: 1.12.2
>
> Hopefully I have provided enough details.
>
> 2018/03/09 13:53:40 [debug] 18726#0: *249 input buf #95517
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe temp offset: 588029952
> 2018/03/09 13:53:40 [debug] 18726#0: *249 input buf #95518
> 2018/03/09 13:53:32 [debug] 18728#0: *189 readv: eof:0, avail:1
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe offset: 368738304
> 2018/03/09 13:53:32 [debug] 18728#0: *189 readv: 2, last:4096
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe buf ls:1 0000556CF522BAF0, pos 0000556CF522BAF0, size: 4096
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe buf ls:1 0000556CF5233250, pos 0000556CF5233250, size: 4096
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe recv chain: 8192
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe buf ls:1 0000556CF522EB20, pos 0000556CF522EB20, size: 4096
> 2018/03/09 13:53:32 [debug] 18728#0: *189 input buf #144060
> 2018/03/09 13:53:40 [debug] 18726#0: *249 size: 8192
> 2018/03/09 13:53:32 [debug] 18728#0: *189 input buf #144061
> 2018/03/09 13:53:40 [debug] 18726#0: *249 writev: 22, 8192, 368738304
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe offset: 588029952
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe buf ls:1 0000556CF5220480, pos 0000556CF5220480, size: 4096
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe buf ls:1 0000556CF5233250, pos 0000556CF5233250, size: 4096
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe temp offset: 368746496
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe buf ls:1 0000556CF522CAF0, pos 0000556CF522CAF0, size: 4096
> 2018/03/09 13:53:53 [debug] 18729#0: *352 writev: 418 of 418
> 2018/03/09 13:53:40 [debug] 18726#0: *249 readv: eof:0, avail:1
> 2018/03/09 13:53:32 [debug] 18728#0: *189 size: 8192
> 2018/03/09 13:53:53 [debug] 18729#0: *352 chain writer out: 0000000000000000
> 2018/03/09 13:53:40 [debug] 18726#0: *249 readv: 2, last:4096
> 2018/03/09 13:53:53 [debug] 18729#0: *352 event timer del: 14: 1520603638051
> 2018/03/09 13:53:32 [debug] 18728#0: *189 writev: 18, 8192, 588029952
> 2018/03/09 13:53:53 [debug] 18729#0: *352 event timer add: 14: 86400000:1520690033051
Mix of different times in logs suggests that workers are blocked
for a long time doing something (and hence the time in some worker
process are not updated for a long time).
Reasons can be different, and more information/logs are needed to
say anything for sure. In this particular case my best guess is
that your backend server is much faster than the disk you use for
proxy_temp_path, and so nginx loops buffering a response to disk
for a long time. For example, the response in *189 already
buffered about 600M, and there is no indication in the log lines
quoted that it stopped reading from the upstream somewhere. At
the same time the process thinks current time is 13:53:32, which
is 21 seconds behind 13:53:53 as logged by pid 18729 at the same
time.
An obvious workaround would be to disable or limit disk buffering,
"proxy_max_temp_file_size 0;". Additionally, using larger memory
buffers (proxy_buffer_size, proxy_buffers) might help to avoid
such monopolization of a worker process.
See also https://trac.nginx.org/nginx/ticket/1431 for a detailed
explanation of a similar problem as observed with websocket
proxying.
[...]
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx
mailing list