Restarting service takes too much time

Sat Dec 10 08:52:37 UTC 2022

Hi Maxim,

we have roundabout 7k ips in use, 3k ipv6, 4k ipv4 and 52 workers.
that results in ~364000 ips which need to be bound - twice that in sockets
if i count port 80 and 443.

we have indeed reuseport active - we already thought about using a
wildcard-address on a socket, but didnt have time to investigate and test
thoroughly..
if its really only useful for balancing udp we might be able to get rid of
it.

we are aware of the need to reduce the number of listening sockets and
config-size per server, however this will be challenging and involve
changes on a lot of levels..
i'll have to look into that again..

thank you for your suggestions in any case!

On Tue, Dec 6, 2022 at 1:34 AM Maxim Dounin <mdounin at mdounin.ru> wrote:

> Hello!
>
> On Mon, Dec 05, 2022 at 09:43:18PM +0100, Charlie Kilo wrote:
>
> > I know the problem also from an environment with many sites and thousands
> > of ips to bind to. for us the problem  is that nginx binds every worker
> to
> > every ip sequentially - leading to a restart time of 10-15 minutes. the
> > problem can easily be observed using strace on the master process during
> > startup.. we couldn't find an easy solution so far.
>
> Could you please share some numbers and details of the
> configuration?  Some strace output with timestamps might be also
> helpful (something like "strace -ttT" would be great).
>
> While binding listening sockets indeed happens sequentially, it is
> expected to take at most seconds even with thousands of listening
> sockets, and even under load, not minutes.  It would be
> interesting to dig into what causes 10-15 minutes restart time.
>
> In particular, in ticket #2188
> (https://trac.nginx.org/nginx/ticket/2188), which was about
> speeding up "nginx -t" with lots of listening sockets under load,
> opening 20k listening sockets (expanded from about 1k sockets in
> the configuration with "listen ... reuseport" and multiple worker
> processes) was observed to take about 1 second without load (and
> up to 15 seconds under load, though this shouldn't affect restart).
>
> Also note that nginx provides a lot of ways to actually do not
> open that many sockets (including using a single socket on a
> wildcard address for a given port instead of a socket for each IP
> address, and not using reuseport, which is really needed only if
> you are balancing UDP).  If the issue you are observing is indeed
> due to slow bind() calls, one of the possible solutions might be
> to reduce the number of listening sockets being used.
>
> --
> Maxim Dounin
> http://mdounin.ru/
> _______________________________________________
> nginx mailing list -- nginx at nginx.org
> To unsubscribe send an email to nginx-leave at nginx.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20221210/9790ab9b/attachment.htm>