Restarting service takes too much time

Tue Dec 6 00:34:36 UTC 2022

Hello!

On Mon, Dec 05, 2022 at 09:43:18PM +0100, Charlie Kilo wrote:

> I know the problem also from an environment with many sites and thousands
> of ips to bind to. for us the problem  is that nginx binds every worker to
> every ip sequentially - leading to a restart time of 10-15 minutes. the
> problem can easily be observed using strace on the master process during
> startup.. we couldn't find an easy solution so far.

Could you please share some numbers and details of the 
configuration?  Some strace output with timestamps might be also 
helpful (something like "strace -ttT" would be great).

While binding listening sockets indeed happens sequentially, it is 
expected to take at most seconds even with thousands of listening 
sockets, and even under load, not minutes.  It would be 
interesting to dig into what causes 10-15 minutes restart time.

In particular, in ticket #2188 
(https://trac.nginx.org/nginx/ticket/2188), which was about 
speeding up "nginx -t" with lots of listening sockets under load, 
opening 20k listening sockets (expanded from about 1k sockets in 
the configuration with "listen ... reuseport" and multiple worker 
processes) was observed to take about 1 second without load (and 
up to 15 seconds under load, though this shouldn't affect restart).

Also note that nginx provides a lot of ways to actually do not 
open that many sockets (including using a single socket on a 
wildcard address for a given port instead of a socket for each IP 
address, and not using reuseport, which is really needed only if 
you are balancing UDP).  If the issue you are observing is indeed 
due to slow bind() calls, one of the possible solutions might be 
to reduce the number of listening sockets being used.

-- 
Maxim Dounin
http://mdounin.ru/