Nginx prematurely closing connections when reloaded

Sébastien Rebecchi srebecchi at kameleoon.com
Fri Mar 29 08:40:57 UTC 2024


Hi Igor,

I did not have error_log directive at main context, so it took default
conf, which seems why i got only 1 log file. I added directive and now I
have more logs when I do nginx -s reload:
2024/03/29 09:04:20 [notice] 1064394#0: signal process started
2024/03/29 09:04:20 [notice] 3718160#0: signal 1 (SIGHUP) received from
1064394, reconfiguring
2024/03/29 09:04:20 [notice] 3718160#0: reconfiguring
2024/03/29 09:04:20 [notice] 3718160#0: using the "epoll" event method
2024/03/29 09:04:20 [notice] 3718160#0: start worker processes
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064395
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064396
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064397
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064398
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064399
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064400
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064401
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064402
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064403
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064404
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064405
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064406
2024/03/29 09:04:20 [notice] 1063598#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063599#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063600#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063601#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063602#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063603#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063604#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063607#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063608#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063597#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063605#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063609#0: gracefully shutting down
2024/03/29 09:04:23 [notice] 3718160#0: signal 17 (SIGCHLD) received from
3989432
2024/03/29 09:04:23 [notice] 3718160#0: worker process 3989432 exited with
code 0
2024/03/29 09:04:23 [notice] 3718160#0: signal 29 (SIGIO) received
2024/03/29 09:04:26 [notice] 1060347#0: exiting
2024/03/29 09:04:26 [notice] 1060347#0: exit
2024/03/29 09:04:26 [notice] 3718160#0: signal 17 (SIGCHLD) received from
1060347
2024/03/29 09:04:26 [notice] 3718160#0: worker process 1060347 exited with
code 0
2024/03/29 09:04:26 [notice] 3718160#0: signal 29 (SIGIO) received
2024/03/29 09:04:29 [notice] 3718160#0: signal 17 (SIGCHLD) received from
3989423
2024/03/29 09:04:29 [notice] 3718160#0: worker process 3989423 exited with
code 0
2024/03/29 09:04:29 [notice] 3718160#0: signal 29 (SIGIO) received
... etc ...

Could the pb I encounter be linked to that discussion?
https://mailman.nginx.org/pipermail/nginx-devel/2024-January/YSJATQMPXDIBETCDS46OTKUZNOJK6Q22.html
Seems a race condition between a client that have started to send a new
request at the same time that the server has decided to close the idle
connection. Is there a plan to add a bugfix in nginx to handle that
properly?

Thanks,

Sébastien

Le ven. 29 mars 2024 à 00:04, Igor Ippolitov <iippolitov at nginx.com> a
écrit :

> Sébastien,
>
> Is it possible that messages go to another log file? These messages go to
> the main error log file, defined in the root context.
> Another common pitfall is a log level above notice. Try setting error log
> to a more verbose one, maybe?
>
> Regards,
> Igor.
>
>
> On 28/03/2024 18:27, Sébastien Rebecchi wrote:
>
> Hi Igor,
>
> Thanks for the answer.
>
> I really got that message 'signal process started' every time i do 'nginx
> -s reload' and this is the only log line I have, I don't have the other
> lines you mentioned. Is there anything to do to enable those logs?
>
> Sébastien
>
> Le jeu. 28 mars 2024, 16:40, Igor Ippolitov <iippolitov at nginx.com> a
> écrit :
>
>> Sébastien,
>>
>> The message about the signal process is only the beginning of the process.
>> You are interested in messages like the following:
>>
>> 2024/03/26 13:36:36 [notice] 723#723: signal 1 (SIGHUP) received from
>> 69064, reconfiguring
>> 2024/03/26 13:36:36 [notice] 723#723: reconfiguring
>> 2024/03/26 13:36:36 [notice] 723#723: using the "epoll" event method
>> 2024/03/26 13:36:36 [notice] 723#723: start worker processes
>> 2024/03/26 13:36:36 [notice] 723#723: start worker process 69065
>> 2024/03/26 13:36:36 [notice] 723#723: start worker process 69066
>> 2024/03/26 13:36:36 [notice] 723#723: start cache manager process 69067
>> 2024/03/26 13:36:36 [notice] 61903#61903: gracefully shutting down
>> 2024/03/26 13:36:36 [notice] 61905#61905: exiting
>> 2024/03/26 13:36:36 [notice] 61903#61903: exiting
>> 2024/03/26 13:36:36 [notice] 61904#61904: gracefully shutting down
>> 2024/03/26 13:36:36 [notice] 61904#61904: exiting
>> 2024/03/26 13:36:36 [notice] 61903#61903: exit
>>
>>
>> Note the 'gracefully shutting down' and 'exiting' message from workers.
>> Also the 'start' and 'reconfiguring' messages from the master process.
>> There should be a similar sequence somewhere in your logs.
>> Having these logs may help explaining what happens on a reload.
>>
>> Kind regards,
>> Igor.
>>
>> On 26/03/2024 12:41, Sébastien Rebecchi wrote:
>>
>> Hi Igor
>>
>> There is no special logs on the IP_1 (the reloaded one) side, only 1 log
>> line, which is expected:
>> --- BEGIN ---
>> 2024/03/26 13:37:55 [notice] 3928855#0: signal process started
>> --- END ---
>>
>> I did not configure worker_shutdown_timeout, it is unlimited.
>>
>> Sébastien.
>>
>> Le lun. 25 mars 2024 à 17:59, Igor Ippolitov <iippolitov at nginx.com> a
>> écrit :
>>
>>> Sébastien,
>>>
>>> Nginx should keep active connections open and wait for a request to
>>> complete before closing.
>>> A reload starts a new set of workers while old workers wait for old
>>> connections to shut down.
>>> The only exception I'm aware of is having worker_shutdown_timeout
>>> configured: in this case a worker will wait till this timeout and forcibly
>>> close a connection. Be default there is no timeout.
>>>
>>> It would be curious to see error log of nginx at IP_1 (the reloaded one)
>>> while the reload happens. It may explain the reason for connection resets.
>>>
>>> Kind regards,
>>> Igor.
>>>
>>> On 25/03/2024 12:31, Sébastien Rebecchi wrote:
>>>
>>> Hello
>>>
>>>
>>> I have an issue with nginx closing prematurely connections when reload
>>> is performed.
>>>
>>>
>>> I have some nginx servers configured to proxy_pass requests to an
>>> upstream group. This group itself is composed of several servers which are
>>> nginx themselves, and is configured to use keepalive connections.
>>>
>>> When I trigger a reload (-s reload) on an nginx of one of the servers
>>> which is target of the upstream, I see in error logs of all servers in
>>> front that connection was reset by the nginx which was reloaded.
>>>
>>>
>>> Here configuration of upstream group (IPs are hidden replaced by IP_X):
>>>
>>> --- BEGIN ---
>>>
>>> upstream data_api {
>>>
>>> random;
>>>
>>>
>>> server IP_1:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_2:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_3:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_4:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_5:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_6:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_7:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_8:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_9:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_10:80 max_fails=3 fail_timeout=30s;
>>>
>>>
>>> keepalive 20;
>>>
>>> }
>>>
>>> --- END ---
>>>
>>>
>>> Here configuration of the location using this upstream:
>>>
>>> --- BEGIN ---
>>>
>>> location / {
>>>
>>> proxy_pass http://data_api;
>>>
>>>
>>> proxy_http_version 1.1;
>>>
>>> proxy_set_header Connection "";
>>>
>>>
>>> proxy_set_header Host $host;
>>>
>>> proxy_set_header X-Real-IP $real_ip;
>>>
>>> proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>>>
>>>
>>> proxy_connect_timeout 2s;
>>>
>>> proxy_send_timeout 6s;
>>>
>>> proxy_read_timeout 10s;
>>>
>>>
>>> proxy_next_upstream error timeout http_502 http_504;
>>>
>>> }
>>>
>>> --- END ---
>>>
>>>
>>> And here the kind of error messages I get when I reload nginx of "IP_1":
>>>
>>> --- BEGIN ---
>>>
>>> 2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed (104:
>>> Connection reset by peer) while reading response header from upstream,
>>> client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN, request: "POST
>>> /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream: "
>>> http://IP_1:80/REQUEST_LOCATION_HIDDEN", host: "HOST_HIDDEN", referrer:
>>> "REFERRER_HIDDEN"
>>>
>>> --- END ---
>>>
>>>
>>> I thought -s reload was doing graceful shutdown of connections. Is it
>>> due to the fact that nginx can not handle that when using keepalive
>>> connections? Is it a bug?
>>>
>>> I am using nginx 1.24.0 everywhere, no particular
>>>
>>>
>>> Thank you for any help.
>>>
>>>
>>> Sébastien
>>>
>>> _______________________________________________
>>> nginx mailing listnginx at nginx.orghttps://mailman.nginx.org/mailman/listinfo/nginx
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20240329/665d2a7d/attachment-0001.htm>


More information about the nginx mailing list