SO_REUSEPORT

Fri Feb 1 14:21:12 UTC 2019

>> If you've seen a
>> percentage of connections being dropped for some time - likely
>> there is another problem elsewhere.

That's definitely what I observed. It was around 50% of this customers
connections and strace on all workers (including the shutting down worker)
did not show the missed connections at the accept level (grep on unique
testing IP).

The only thing strange I was able to note was the one process remaining in
"worker is shutting down state" (it's not uncommon for us to have a few
workers hanging around for a while due to websocket or similar connections
keeping workers open). This is why I formulated this theory.

Further reloads did not resolve the issue, it took a restart of the nginx
process to get everything back to normal.

As far as I am aware no other nginx process was started on the server
(systemd manages nginx).

On Sat, Feb 2, 2019 at 1:13 AM Maxim Dounin <mdounin at mdounin.ru> wrote:

> Hello!
>
> On Fri, Feb 01, 2019 at 11:04:50AM +1100, Mathew Heard wrote:
>
> > Hit a rather strange issue today on a production service where during a
> > configuration reload (evident by the worker processes in the process of
> > being shutdown). During this reload a percentage of connections were not
> > getting accepted (and hence not processed). I was able to confirm that
> none
> > of the processes were accepting the connections.
> >
> > Our configuration includes the reuseport option so my theory was that for
> > some reason connections were still being routed to the shutting down
> > worker, which was not accepting new connections.
>
> With "listen ... reuseport" nginx creates a listening socket for
> each worker process.  And on configuration reload these sockets
> are passed to the new worker processes, so there shouldn't be
> problems.
>
> The only "risky" case is reducing the number of worker process.
> Reducing the number of worker process means that some of the
> listening sockets will be closed, and on Linux this can result
> in rejecting some of the connection requests sitting in these
> sockets when these sockets are closed.  (AFAIK, this is properly
> handled only on DragonFly BSD, where connection requests are
> redistributed to other sockets in such a case.)
>
> This is, however, not about "a percentage of connections", but
> about a small number of connections sitting in the listening
> socket when old worker process is instructed to exit gracefully
> and closes the listening socket.
>
> If you've changed the number of worker processes and seen several
> connections dropped - this may be the case.  If you've seen a
> percentage of connections being dropped for some time - likely
> there is another problem elsewhere.
>
> In particular, one common caveat with "listen ... reuseport" is
> that listening socket no longer prevents multiple instances of
> nginx (or event different servers) from running on the same port.
> As a result, accidentally starting another nginx instance can
> easily screw up things.
>
> --
> Maxim Dounin
> http://mdounin.ru/
> _______________________________________________
> nginx-devel mailing list
> nginx-devel at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20190202/1c90715f/attachment.html>