nginx taking too long to respond

Maxim Dounin mdounin at mdounin.ru
Sat Jul 17 04:33:55 UTC 2021


Hello!

On Fri, Jul 16, 2021 at 02:54:29PM -0400, mrgonza78 wrote:

> I have an nginx used mainly as a reverse proxy for a couple of upstream
> services. This nginx has a simple endpoint used for health checks:
> 
>     location /ping { return 200 '{"ping":"successful"}'; }
> 
> The problem I'm having is that this ping takes too long to be responded:
> 
>     $ cat /proc/loadavg; date ; httpstat localhost/ping?foo=bar
>     2.93 1.98 1.94 8/433 16725
>     Thu Jul 15 15:25:08 UTC 2021
>     Connected to 127.0.0.1:80 from 127.0.0.1:42946
> 
>     HTTP/1.1 200 OK
>     Date: Thu, 15 Jul 2021 15:26:24 GMT
>     X-Request-ID: b8d276b0b3828113cfee3bf2daa01293
> 
>       DNS Lookup   TCP Connection   Server Processing   Content Transfer
>     [     4ms    |       0ms      |      76032ms      |        0ms       ]
>             namelookup:4ms
>                                 connect:4ms
>                                               starttransfer:76036ms
> total:76036ms
> 
> That ^ is telling me that the average load is low at the time of the request
> (2.93 the 1m average load for an 8-core server is ok)
> 
> Curl/httpstat initiated the request at 15:25:08 and response was obtained
> 15:26:24. Connection was stablished fast, request sent, then it took 76s for
> the server to respond.
> 
> If I look at the access log for this ping I see "req_time":"0.000" (this is
> the $request_time variable).
> 
>  
> {"t":"2021-07-15T15:26:24+00:00","id":"b8d276b0b3828113cfee3bf2daa01293","cid":"18581172","pid":"13631","host":"localhost","req":"GET
> /ping?foo=bar
> HTTP/1.1","scheme":"","status":"200","req_time":"0.000","body_sent":"21","bytes_sent":"373","content_length":"","request_length":"85","stats":"","upstream":{"status":"","sent":"","received":"","addr":"","conn_time":"","resp_time":""},"client":{"id":"#","agent":"curl/7.58.0","addr":",127.0.0.1:42946"},"limit_status":{"conn":"","req":""}}
>                                                                 
> 
> This is the access log format in case anybody wonders what are the rest of
> the values:
> 
> '{"t":"$time_iso8601","id":"$ring_request_id","cid":"$connection","pid":"$pid","host":"$http_host","req":"$request","scheme":"$http_x_forwarded_proto","status":"$status","req_time":"$request_time","body_sent":"$body_bytes_sent","bytes_sent":"$bytes_sent","content_length":"$content_length","request_length":"$request_length","stats":"$location_tag","upstream":{"status":"$upstream_status","sent":"$upstream_bytes_sent","received":"$upstream_bytes_received","addr":"$upstream_addr","conn_time":"$upstream_connect_time","resp_time":"$upstream_response_time"},"client":{"id":"$http_x_auth_appid$http_x_ringdevicetype#$remote_user$http_x_auth_userid","agent":"$http_user_agent","addr":"$http_x_forwarded_for,$remote_addr:$remote_port"},"limit_status":{"conn":"$limit_conn_status","req":"$limit_req_status"}}';
> 
> My question is: where could nginx have spent these 76s if the request just
> took 0s to be processed and responded?
> 
> Something special to mention is that the server is timing out a lof of
> connections with the upstreams at that moment as well: we see a lot of
> upstream timed out (110: Connection timed out) while reading response header
> from upstream and upstream server temporarily disabled while reading
> response header from upstream.
> 
> So, these two are related, what I can't see is why upstream timeouts would
> lead to a /ping taking 76s to be attended and responded when both cpu and
> load are low/acceptable.

Most likely nginx worker process was blocked on something, and 
wasn't able to handle requests due to this.  It processed the 
request as soon as it was unblocked.

Typical cases which can cause blocking include: serving huge files 
over fast links with sendfile without sendfile_max_chunk, serving 
files from NFS volumes, badly-written embedded Perl code which 
uses potentially long-running operations such as connecting to 
other servers or databases, badly-written 3rd party modules which 
use potentially long-running operations.

The 75s time might indicate that the culprit is a blocking 
connect() somewhere, so I would recommend checking for 3rd party 
modules and/or embedded Perl code first.

-- 
Maxim Dounin
http://mdounin.ru/


More information about the nginx mailing list