mat999 at gmail.com
Fri Feb 1 14:21:12 UTC 2019
>> If you've seen a
>> percentage of connections being dropped for some time - likely
>> there is another problem elsewhere.
That's definitely what I observed. It was around 50% of this customers
connections and strace on all workers (including the shutting down worker)
did not show the missed connections at the accept level (grep on unique
The only thing strange I was able to note was the one process remaining in
"worker is shutting down state" (it's not uncommon for us to have a few
workers hanging around for a while due to websocket or similar connections
keeping workers open). This is why I formulated this theory.
Further reloads did not resolve the issue, it took a restart of the nginx
process to get everything back to normal.
As far as I am aware no other nginx process was started on the server
(systemd manages nginx).
On Sat, Feb 2, 2019 at 1:13 AM Maxim Dounin <mdounin at mdounin.ru> wrote:
> On Fri, Feb 01, 2019 at 11:04:50AM +1100, Mathew Heard wrote:
> > Hit a rather strange issue today on a production service where during a
> > configuration reload (evident by the worker processes in the process of
> > being shutdown). During this reload a percentage of connections were not
> > getting accepted (and hence not processed). I was able to confirm that
> > of the processes were accepting the connections.
> > Our configuration includes the reuseport option so my theory was that for
> > some reason connections were still being routed to the shutting down
> > worker, which was not accepting new connections.
> With "listen ... reuseport" nginx creates a listening socket for
> each worker process. And on configuration reload these sockets
> are passed to the new worker processes, so there shouldn't be
> The only "risky" case is reducing the number of worker process.
> Reducing the number of worker process means that some of the
> listening sockets will be closed, and on Linux this can result
> in rejecting some of the connection requests sitting in these
> sockets when these sockets are closed. (AFAIK, this is properly
> handled only on DragonFly BSD, where connection requests are
> redistributed to other sockets in such a case.)
> This is, however, not about "a percentage of connections", but
> about a small number of connections sitting in the listening
> socket when old worker process is instructed to exit gracefully
> and closes the listening socket.
> If you've changed the number of worker processes and seen several
> connections dropped - this may be the case. If you've seen a
> percentage of connections being dropped for some time - likely
> there is another problem elsewhere.
> In particular, one common caveat with "listen ... reuseport" is
> that listening socket no longer prevents multiple instances of
> nginx (or event different servers) from running on the same port.
> As a result, accidentally starting another nginx instance can
> easily screw up things.
> Maxim Dounin
> nginx-devel mailing list
> nginx-devel at nginx.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the nginx-devel