Restarting service takes too much time
mdounin at mdounin.ru
Sun Dec 11 23:30:03 UTC 2022
On Sat, Dec 10, 2022 at 09:52:37AM +0100, Charlie Kilo wrote:
> we have roundabout 7k ips in use, 3k ipv6, 4k ipv4 and 52 workers.
> that results in ~364000 ips which need to be bound - twice that in sockets
> if i count port 80 and 443.
> we have indeed reuseport active - we already thought about using a
> wildcard-address on a socket, but didnt have time to investigate and test
> if its really only useful for balancing udp we might be able to get rid of
Thanks for the details. Running with 700k listening sockets
indeed might be a challenge.
Further, it looks like Linux isn't very effective when handling
lots of listening sockets on the same port. In my limited
testing, binding 10k listening sockets on the same port takes
about 10 seconds, binding 20k listening sockets takes 50 seconds,
and binding 30k listening sockets takes 140 seconds.
The most simple and effective solution should be to use listen on
the wildcard address on the relevant port somewhere in the
configuration, such as "listen 80;" (with "reuseport" if needed,
see below), so nginx will open just one listening socket and will
distribute connections based on the local address as obtained by
getsockname(), see the description of the "bind" parameter of the
"listen" directive (http://nginx.org/r/listen). The only
additional change to the configuration this requires is removing
all socket options from the per-IP listen directives, so nginx
won't try to bind them separately.
Not using "reuseport" should be an option too, but keep in mind
that in nginx versions before 1.21.6 it might be also useful as a
workaround for uneven distribution of connections between worker
processes on modern Linux versions As an alternative solution,
"accept_mutex on;" can be used (see
https://trac.nginx.org/nginx/ticket/2285 for details).
More information about the nginx