nginx taking too long to respond

Fri Jul 16 18:54:29 UTC 2021

I have an nginx used mainly as a reverse proxy for a couple of upstream
services. This nginx has a simple endpoint used for health checks:

    location /ping { return 200 '{"ping":"successful"}'; }

The problem I'm having is that this ping takes too long to be responded:

    $ cat /proc/loadavg; date ; httpstat localhost/ping?foo=bar
    2.93 1.98 1.94 8/433 16725
    Thu Jul 15 15:25:08 UTC 2021
    Connected to 127.0.0.1:80 from 127.0.0.1:42946

    HTTP/1.1 200 OK
    Date: Thu, 15 Jul 2021 15:26:24 GMT
    X-Request-ID: b8d276b0b3828113cfee3bf2daa01293

      DNS Lookup   TCP Connection   Server Processing   Content Transfer
    [     4ms    |       0ms      |      76032ms      |        0ms       ]
            namelookup:4ms
                                connect:4ms
                                              starttransfer:76036ms
total:76036ms

That ^ is telling me that the average load is low at the time of the request
(2.93 the 1m average load for an 8-core server is ok)

Curl/httpstat initiated the request at 15:25:08 and response was obtained
15:26:24. Connection was stablished fast, request sent, then it took 76s for
the server to respond.

If I look at the access log for this ping I see "req_time":"0.000" (this is
the $request_time variable).

{"t":"2021-07-15T15:26:24+00:00","id":"b8d276b0b3828113cfee3bf2daa01293","cid":"18581172","pid":"13631","host":"localhost","req":"GET
/ping?foo=bar
HTTP/1.1","scheme":"","status":"200","req_time":"0.000","body_sent":"21","bytes_sent":"373","content_length":"","request_length":"85","stats":"","upstream":{"status":"","sent":"","received":"","addr":"","conn_time":"","resp_time":""},"client":{"id":"#","agent":"curl/7.58.0","addr":",127.0.0.1:42946"},"limit_status":{"conn":"","req":""}}

This is the access log format in case anybody wonders what are the rest of
the values:

'{"t":"$time_iso8601","id":"$ring_request_id","cid":"$connection","pid":"$pid","host":"$http_host","req":"$request","scheme":"$http_x_forwarded_proto","status":"$status","req_time":"$request_time","body_sent":"$body_bytes_sent","bytes_sent":"$bytes_sent","content_length":"$content_length","request_length":"$request_length","stats":"$location_tag","upstream":{"status":"$upstream_status","sent":"$upstream_bytes_sent","received":"$upstream_bytes_received","addr":"$upstream_addr","conn_time":"$upstream_connect_time","resp_time":"$upstream_response_time"},"client":{"id":"$http_x_auth_appid$http_x_ringdevicetype#$remote_user$http_x_auth_userid","agent":"$http_user_agent","addr":"$http_x_forwarded_for,$remote_addr:$remote_port"},"limit_status":{"conn":"$limit_conn_status","req":"$limit_req_status"}}';

My question is: where could nginx have spent these 76s if the request just
took 0s to be processed and responded?

Something special to mention is that the server is timing out a lof of
connections with the upstreams at that moment as well: we see a lot of
upstream timed out (110: Connection timed out) while reading response header
from upstream and upstream server temporarily disabled while reading
response header from upstream.

So, these two are related, what I can't see is why upstream timeouts would
lead to a /ping taking 76s to be attended and responded when both cpu and
load are low/acceptable.

Any idea?
Thanks 
Alejandro

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,292073,292073#msg-292073