One upstream connection blocks another upstream connection

Maxim Dounin mdounin at mdounin.ru
Tue Mar 13 17:01:23 UTC 2018


Hello!

On Tue, Mar 13, 2018 at 12:39:05PM +0000, lje at napatech.com wrote:

> IБ─≥m investing a problem where I get delays in requests that is 
> proxied to an upstream service.
> By looking at the debug log it seems that one request is 
> blocking the worker so it canБ─≥t complete another requests.
> 
> If you look at the log below request *352 is handled by worker 
> 18728. This worker then starts to process request *360 but for 
> reason it is blocked at 13:53:53 until 13:54:03. How can that 
> happen ?
> 
> The upstream service for request *360 doesnБ─≥t send any data in 
> the time interval 13:53:53-15:54:03.
> But the upstream service for request *352 responses nearly 
> immediately.
> (I have examined the communication between nginx and the 
> upstream in wireshark)
> 
> Another observation, in the time interval 13:53:53-15:54:03 the 
> worker process seems to be in state D (uninterruptible sleep)
> 
> So my question is: What can block worker 18728 so it doesnБ─≥t 
> complete request *352
> 
> OS: Redhat 7.4
> Nginx: 1.12.2
> 
> Hopefully I have provided enough details.
> 
> 2018/03/09 13:53:40 [debug] 18726#0: *249 input buf #95517
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe temp offset: 588029952
> 2018/03/09 13:53:40 [debug] 18726#0: *249 input buf #95518
> 2018/03/09 13:53:32 [debug] 18728#0: *189 readv: eof:0, avail:1
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe offset: 368738304
> 2018/03/09 13:53:32 [debug] 18728#0: *189 readv: 2, last:4096
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe buf ls:1 0000556CF522BAF0, pos 0000556CF522BAF0, size: 4096
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe buf ls:1 0000556CF5233250, pos 0000556CF5233250, size: 4096
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe recv chain: 8192
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe buf ls:1 0000556CF522EB20, pos 0000556CF522EB20, size: 4096
> 2018/03/09 13:53:32 [debug] 18728#0: *189 input buf #144060
> 2018/03/09 13:53:40 [debug] 18726#0: *249 size: 8192
> 2018/03/09 13:53:32 [debug] 18728#0: *189 input buf #144061
> 2018/03/09 13:53:40 [debug] 18726#0: *249 writev: 22, 8192, 368738304
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe offset: 588029952
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe buf ls:1 0000556CF5220480, pos 0000556CF5220480, size: 4096
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe buf ls:1 0000556CF5233250, pos 0000556CF5233250, size: 4096
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe temp offset: 368746496
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe buf ls:1 0000556CF522CAF0, pos 0000556CF522CAF0, size: 4096
> 2018/03/09 13:53:53 [debug] 18729#0: *352 writev: 418 of 418
> 2018/03/09 13:53:40 [debug] 18726#0: *249 readv: eof:0, avail:1
> 2018/03/09 13:53:32 [debug] 18728#0: *189 size: 8192
> 2018/03/09 13:53:53 [debug] 18729#0: *352 chain writer out: 0000000000000000
> 2018/03/09 13:53:40 [debug] 18726#0: *249 readv: 2, last:4096
> 2018/03/09 13:53:53 [debug] 18729#0: *352 event timer del: 14: 1520603638051
> 2018/03/09 13:53:32 [debug] 18728#0: *189 writev: 18, 8192, 588029952
> 2018/03/09 13:53:53 [debug] 18729#0: *352 event timer add: 14: 86400000:1520690033051

Mix of different times in logs suggests that workers are blocked 
for a long time doing something (and hence the time in some worker 
process are not updated for a long time).

Reasons can be different, and more information/logs are needed to 
say anything for sure.  In this particular case my best guess is 
that your backend server is much faster than the disk you use for 
proxy_temp_path, and so nginx loops buffering a response to disk 
for a long time.  For example, the response in *189 already 
buffered about 600M, and there is no indication in the log lines 
quoted that it stopped reading from the upstream somewhere.  At 
the same time the process thinks current time is 13:53:32, which 
is 21 seconds behind 13:53:53 as logged by pid 18729 at the same 
time.

An obvious workaround would be to disable or limit disk buffering, 
"proxy_max_temp_file_size 0;".  Additionally, using larger memory 
buffers (proxy_buffer_size, proxy_buffers) might help to avoid 
such monopolization of a worker process.

See also https://trac.nginx.org/nginx/ticket/1431 for a detailed 
explanation of a similar problem as observed with websocket 
proxying.

[...]

-- 
Maxim Dounin
http://mdounin.ru/


More information about the nginx mailing list