Restarting service takes too much time
mdounin at mdounin.ru
Tue Dec 6 00:34:36 UTC 2022
On Mon, Dec 05, 2022 at 09:43:18PM +0100, Charlie Kilo wrote:
> I know the problem also from an environment with many sites and thousands
> of ips to bind to. for us the problem is that nginx binds every worker to
> every ip sequentially - leading to a restart time of 10-15 minutes. the
> problem can easily be observed using strace on the master process during
> startup.. we couldn't find an easy solution so far.
Could you please share some numbers and details of the
configuration? Some strace output with timestamps might be also
helpful (something like "strace -ttT" would be great).
While binding listening sockets indeed happens sequentially, it is
expected to take at most seconds even with thousands of listening
sockets, and even under load, not minutes. It would be
interesting to dig into what causes 10-15 minutes restart time.
In particular, in ticket #2188
(https://trac.nginx.org/nginx/ticket/2188), which was about
speeding up "nginx -t" with lots of listening sockets under load,
opening 20k listening sockets (expanded from about 1k sockets in
the configuration with "listen ... reuseport" and multiple worker
processes) was observed to take about 1 second without load (and
up to 15 seconds under load, though this shouldn't affect restart).
Also note that nginx provides a lot of ways to actually do not
open that many sockets (including using a single socket on a
wildcard address for a given port instead of a socket for each IP
address, and not using reuseport, which is really needed only if
you are balancing UDP). If the issue you are observing is indeed
due to slow bind() calls, one of the possible solutions might be
to reduce the number of listening sockets being used.
More information about the nginx