nginx stuck in tight loop sometimes

James Beal james_ at catbus.co.uk
Tue Jan 19 12:47:11 UTC 2021


We have quite a high volume site, we have 4 front end nginx servers, each:
*
AMD EPYC 7402P 24-Core Processor
*
INTEL SSDPELKX020T8 ( 2TB NVMe )
*
Dual  Broadcom BCM57416 NetXtreme-E 10GBase-T
*
512GB of RAM
We have a fairly complex nginx config with sharded caches as explained in https://www.nginx.com/blog/shared-caches-nginx-plus-cache-clusters-part-1/

We see this problem on :

nginx version: nginx/1.19.6
built by gcc 8.3.0 (Debian 8.3.0-6)
built with OpenSSL 1.1.1d  10 Sep 2019
TLS SNI support enabled
configure arguments: --add-module=/root/incubator-pagespeed-ngx-latest-stable --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_ssl_module --with-http_stub_status_module --with-pcre-jit --with-http_secure_link_module --with-http_v2_module --with-http_realip_module --with-stream_geoip_module --http-scgi-temp-path=/tmp --http-uwsgi-temp-path=/tmp --http-fastcgi-temp-path=/tmp --http-proxy-temp-path=/tmp --http-log-path=/var/log/nginx/access --error-log-path=/var/log/nginx/error --pid-path=/var/run/nginx.pid --conf-path=/etc/nginx/nginx.conf --sbin-path=/usr/sbin --prefix=/usr --with-threads

Pagespeed is our only third party module and it is version 1.13.35.2-0

Some nginx process start to spin in a tight loop, strace shows:

write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0 at y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)
write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0 at y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)
write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0 at y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)
write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0 at y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)
write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0 at y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)
write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0 at y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)

looking in /proc 

root at ao3-front08:/proc/799697/fd# ls -l 168
l-wx------ 1 nginx nginx 64 Jan 18 22:05 168 -> 'pipe:[2914414548]'

root at ao3-front08:/proc# grep 2914414548 /tmp/fds
lr-x------ 1 nginx nginx 64 Jan 18 22:05 799697/fd/167 -> pipe:[2914414548]
l-wx------ 1 nginx nginx 64 Jan 18 22:05 799697/fd/168 -> pipe:[2914414548]

The issue happens more when load is higher. Has anyone some advice as my current hack of killing processes that have used more than 1800 seconds of cpu is wrong.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20210119/dc4b2f83/attachment.htm>


More information about the nginx mailing list