Restarting service takes too much time

Sun Dec 11 23:30:03 UTC 2022

Hello!

On Sat, Dec 10, 2022 at 09:52:37AM +0100, Charlie Kilo wrote:

> we have roundabout 7k ips in use, 3k ipv6, 4k ipv4 and 52 workers.
> that results in ~364000 ips which need to be bound - twice that in sockets
> if i count port 80 and 443.
> 
> we have indeed reuseport active - we already thought about using a
> wildcard-address on a socket, but didnt have time to investigate and test
> thoroughly..
> if its really only useful for balancing udp we might be able to get rid of
> it.

Thanks for the details.  Running with 700k listening sockets 
indeed might be a challenge.

Further, it looks like Linux isn't very effective when handling 
lots of listening sockets on the same port.  In my limited 
testing, binding 10k listening sockets on the same port takes 
about 10 seconds, binding 20k listening sockets takes 50 seconds, 
and binding 30k listening sockets takes 140 seconds.

The most simple and effective solution should be to use listen on 
the wildcard address on the relevant port somewhere in the 
configuration, such as "listen 80;" (with "reuseport" if needed, 
see below), so nginx will open just one listening socket and will 
distribute connections based on the local address as obtained by 
getsockname(), see the description of the "bind" parameter of the 
"listen" directive (http://nginx.org/r/listen).  The only 
additional change to the configuration this requires is removing 
all socket options from the per-IP listen directives, so nginx 
won't try to bind them separately.

Not using "reuseport" should be an option too, but keep in mind 
that in nginx versions before 1.21.6 it might be also useful as a 
workaround for uneven distribution of connections between worker 
processes on modern Linux versions As an alternative solution, 
"accept_mutex on;" can be used (see 
https://trac.nginx.org/nginx/ticket/2285 for details).

-- 
Maxim Dounin
http://mdounin.ru/