<div dir="ltr"><div dir="ltr"></div>>> If you've seen a <br>>> percentage of connections being dropped for some time - likely <br>>> there is another problem elsewhere. <div> </div><div>That's definitely what I observed. It was around 50% of this customers connections and strace on all workers (including the shutting down worker) did not show the missed connections at the accept level (grep on unique testing IP).</div><div><br></div><div>The only thing strange I was able to note was the one process remaining in "worker is shutting down state" (it's not uncommon for us to have a few workers hanging around for a while due to websocket or similar connections keeping workers open). This is why I formulated this theory.</div><div><br>Further reloads did not resolve the issue, it took a restart of the nginx process to get everything back to normal.</div><div><br></div><div>As far as I am aware no other nginx process was started on the server (systemd manages nginx).</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Feb 2, 2019 at 1:13 AM Maxim Dounin <<a href="mailto:mdounin@mdounin.ru">mdounin@mdounin.ru</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello!<br>

<br>

On Fri, Feb 01, 2019 at 11:04:50AM +1100, Mathew Heard wrote:<br>

<br>

> Hit a rather strange issue today on a production service where during a<br>

> configuration reload (evident by the worker processes in the process of<br>

> being shutdown). During this reload a percentage of connections were not<br>

> getting accepted (and hence not processed). I was able to confirm that none<br>

> of the processes were accepting the connections.<br>

> <br>

> Our configuration includes the reuseport option so my theory was that for<br>

> some reason connections were still being routed to the shutting down<br>

> worker, which was not accepting new connections.<br>

<br>

With "listen ... reuseport" nginx creates a listening socket for <br>

each worker process.  And on configuration reload these sockets <br>

are passed to the new worker processes, so there shouldn't be <br>

problems.<br>

<br>

The only "risky" case is reducing the number of worker process.  <br>

Reducing the number of worker process means that some of the <br>

listening sockets will be closed, and on Linux this can result <br>

in rejecting some of the connection requests sitting in these <br>

sockets when these sockets are closed.  (AFAIK, this is properly <br>

handled only on DragonFly BSD, where connection requests are <br>

redistributed to other sockets in such a case.)<br>

<br>

This is, however, not about "a percentage of connections", but <br>

about a small number of connections sitting in the listening <br>

socket when old worker process is instructed to exit gracefully <br>

and closes the listening socket.<br>

<br>

If you've changed the number of worker processes and seen several <br>

connections dropped - this may be the case.  If you've seen a <br>

percentage of connections being dropped for some time - likely <br>

there is another problem elsewhere.<br>

<br>

In particular, one common caveat with "listen ... reuseport" is <br>

that listening socket no longer prevents multiple instances of <br>

nginx (or event different servers) from running on the same port.  <br>

As a result, accidentally starting another nginx instance can <br>

easily screw up things.<br>

<br>

-- <br>

Maxim Dounin<br>

<a href="http://mdounin.ru/" rel="noreferrer" target="_blank">http://mdounin.ru/</a><br>

_______________________________________________<br>

nginx-devel mailing list<br>

<a href="mailto:nginx-devel@nginx.org" target="_blank">nginx-devel@nginx.org</a><br>

<a href="http://mailman.nginx.org/mailman/listinfo/nginx-devel" rel="noreferrer" target="_blank">http://mailman.nginx.org/mailman/listinfo/nginx-devel</a><br>

</blockquote></div></div></div>