<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Got a little bit further and confirmed this is definitely to do with the binary upgrade.<div><br></div><div><div>www-data   988 99.9  0.7 365124 122784 ?       R    Jan30 131740:46 nginx: worker process</div><div>root      2800  0.0  1.0 361828 165044 ?       Ss   Jan05  27:54 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;</div></div><div><br></div><div>2800   is nginx.old, also (nginx/1.15.8) as we did 2 builds with slightly different compile options.</div><div><br></div><div>The processes do not respond to nice kill signals, only a -9 was able to kill it.</div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 24, 2019 at 10:38 AM Mathew Heard <<a href="mailto:mat999@gmail.com">mat999@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Yesterday one of my techs reported that a production server had a nginx worker sitting at 99.9% CPU usage in top and not accepting new connections (but getting it's share distributed due to SO_REUSEPORT). I thought this might be related.<div><br></div><div>The workers age was significantly older than it's peers so it appears to have been a worker left from a previous configuration reload. It was child to the single running master process and there was nothing of interest in  the error logs.</div><div><br></div><div>Just sharing this here as it sounds related.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Feb 5, 2019 at 11:25 PM Tomas Kvasnicka <<a href="mailto:nzt4567@gmx.com" target="_blank">nzt4567@gmx.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi!<br>

<br>

I just wanted to add my 2 cents here….<br>

<br>

- We are facing a very similar issue. During reloads of reuseport enabled configuration it sometimes happens that no workers accept new connections, bandwidth basically drops to 0 and it all reverts back to normal within few seconds. Hurts but not deadly.<br>

- Our nginx uses the Intel QAT card which allows only a limited number of user-space processes using the card at any given moment. Therefore, our HUP handler has been patched quite a bit to shutdown workers one-by-one and then use the ngx_reap_child to start a new worker with updated configuration.<br>

- We only use the reuseport option in combination with the QAT-based servers (servers with the patched HUP behaviour). Therefore, I am not yet sure if changes in the reload mechanism are the cause of the issue - still investigating.<br>

- Thanks a lot for the “ss -nltp” tip.<br>

- Currently testing the “iptables-drop-empty-ACKs” workaround.<br>

- No two nginx masters are running simultaneously.<br>

- Reducing the number of workers is also not happening, due to the changes described above - the new value of worker_processes will simply be ignored.<br>

<br>

Thanks,<br>

Tomas<br>

_______________________________________________<br>

nginx-devel mailing list<br>

<a href="mailto:nginx-devel@nginx.org" target="_blank">nginx-devel@nginx.org</a><br>

<a href="http://mailman.nginx.org/mailman/listinfo/nginx-devel" rel="noreferrer" target="_blank">http://mailman.nginx.org/mailman/listinfo/nginx-devel</a></blockquote></div>

</blockquote></div>