SO_REUSEPORT

Thu May 2 10:51:41 UTC 2019

Got a little bit further and confirmed this is definitely to do with the
binary upgrade.

www-data   988 99.9  0.7 365124 122784 ?       R    Jan30 131740:46 nginx:
worker process
root      2800  0.0  1.0 361828 165044 ?       Ss   Jan05  27:54 nginx:
master process /usr/sbin/nginx -g daemon on; master_process on;

2800   is nginx.old, also (nginx/1.15.8) as we did 2 builds with slightly
different compile options.

The processes do not respond to nice kill signals, only a -9 was able to
kill it.

On Wed, Apr 24, 2019 at 10:38 AM Mathew Heard <mat999 at gmail.com> wrote:

> Yesterday one of my techs reported that a production server had a nginx
> worker sitting at 99.9% CPU usage in top and not accepting new connections
> (but getting it's share distributed due to SO_REUSEPORT). I thought this
> might be related.
>
> The workers age was significantly older than it's peers so it appears to
> have been a worker left from a previous configuration reload. It was child
> to the single running master process and there was nothing of interest in
> the error logs.
>
> Just sharing this here as it sounds related.
>
> On Tue, Feb 5, 2019 at 11:25 PM Tomas Kvasnicka <nzt4567 at gmx.com> wrote:
>
>> Hi!
>>
>> I just wanted to add my 2 cents here….
>>
>> - We are facing a very similar issue. During reloads of reuseport enabled
>> configuration it sometimes happens that no workers accept new connections,
>> bandwidth basically drops to 0 and it all reverts back to normal within few
>> seconds. Hurts but not deadly.
>> - Our nginx uses the Intel QAT card which allows only a limited number of
>> user-space processes using the card at any given moment. Therefore, our HUP
>> handler has been patched quite a bit to shutdown workers one-by-one and
>> then use the ngx_reap_child to start a new worker with updated
>> configuration.
>> - We only use the reuseport option in combination with the QAT-based
>> servers (servers with the patched HUP behaviour). Therefore, I am not yet
>> sure if changes in the reload mechanism are the cause of the issue - still
>> investigating.
>> - Thanks a lot for the “ss -nltp” tip.
>> - Currently testing the “iptables-drop-empty-ACKs” workaround.
>> - No two nginx masters are running simultaneously.
>> - Reducing the number of workers is also not happening, due to the
>> changes described above - the new value of worker_processes will simply be
>> ignored.
>>
>> Thanks,
>> Tomas
>> _______________________________________________
>> nginx-devel mailing list
>> nginx-devel at nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20190502/2c739eb5/attachment.html>