Random TCP Timeouts

s.schumacher nginx-forum at forum.nginx.org
Tue Sep 20 11:03:12 UTC 2022


Hello

I am running about sixty nginx webservers on a Proxmox Cluster (uses KVM).
The VMs are running the most recent version of Debian 11. They use nginx,
different versions of PHP-FPM and MariaDB. I follow a
infrastructure-as-code-approch, all the servers with some exceptions in our
infrastrucure like jitsi are provisioned by ansible and therefore nearly
identical. I host standard Typo3-Installations as well as more complex
applications usually developed in Laravel. Some time ago I started
configuring active checks in CheckMK (the checks use a plugin from Nagios
called check_http) for our infrastructure and projects of our customers
which had gone live, meaning accessible from the outside.

After this I started getting timeout errors at the frequency of about one or
two a day spread seemingly at random accross the twenty servers which I
monitor with active checks. At first I considered this to be simply false
positives, but last Friday it happened during a Jitsi Conference and was
reported to me by a colleague. I checked the logs of nginx and found the
following entries for the exact time period in which Checkmk couldn't reach
the server, which is less than one minute (time between checks, the next
check is always negative, meaning no errors) and probably only a few
seconds. What is the cause of this problem and how can I fix it? Do you have
a suggestion how I could reproduce and then further analyze the problem?

Checkmk-Error-Message:

Summary connect to address 195.34.XXX.XXX and port 443: Connection refused
Details HTTP CRITICAL - Unable to open TCP socket

Checkmk-Recovery-Message:

Summary HTTP OK: HTTP/1.1 200 OK - 59404 bytes in 0.008 second response
time

Nginx error log:

2022/09/16 11:18:42 [alert] 3212994#3212994: *2590 open socket #18 left in
connection 5
2022/09/16 11:18:42 [alert] 3212994#3212994: *2494 open socket #15 left in
connection 8
2022/09/16 11:18:42 [alert] 3212994#3212994: *2533 open socket #16 left in
connection 9
2022/09/16 11:18:42 [alert] 3212994#3212994: *2534 open socket #17 left in
connection 10
2022/09/16 11:18:42 [alert] 3212994#3212994: *2591 open socket #20 left in
connection 11
2022/09/16 11:18:42 [alert] 3212994#3212994: *2573 open socket #24 left in
connection 12
2022/09/16 11:18:42 [alert] 3212994#3212994: *2532 open socket #10 left in
connection 13
2022/09/16 11:18:42 [alert] 3212994#3212994: *3230 open socket #28 left in
connection 14
2022/09/16 11:18:42 [alert] 3212994#3212994: *2467 open socket #19 left in
connection 15
2022/09/16 11:18:42 [alert] 3212994#3212994: *2535 open socket #21 left in
connection 16
2022/09/16 11:18:42 [alert] 3212994#3212994: *3233 open socket #27 left in
connection 17
2022/09/16 11:18:42 [alert] 3212994#3212994: *2771 open socket #30 left in
connection 22
2022/09/16 11:18:42 [alert] 3212994#3212994: *2770 open socket #29 left in
connection 23
2022/09/16 11:18:42 [alert] 3212994#3212994: *3234 open socket #22 left in
connection 24
2022/09/16 11:18:42 [alert] 3212994#3212994: *3229 open socket #11 left in
connection 26
2022/09/16 11:18:42 [alert] 3212994#3212994: *3231 open socket #32 left in
connection 28
2022/09/16 11:18:42 [alert] 3212994#3212994: aborting
2022/09/16 11:20:19 [error] 3295994#3295994: *153 upstream timed out (110:
Connection timed out) while reading response>

Yours sincerely

Stefan Malte Schumacher

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,295249,295249#msg-295249



More information about the nginx mailing list