nginx stuck in tight loop sometimes

Tue Jan 19 13:08:35 UTC 2021

Hello!

On Tue, Jan 19, 2021 at 12:47:11PM +0000, James Beal wrote:

> We have quite a high volume site, we have 4 front end nginx servers, each:
> *
> AMD EPYC 7402P 24-Core Processor
> *
> INTEL SSDPELKX020T8 ( 2TB NVMe )
> *
> Dual  Broadcom BCM57416 NetXtreme-E 10GBase-T
> *
> 512GB of RAM
> We have a fairly complex nginx config with sharded caches as explained in https://www.nginx.com/blog/shared-caches-nginx-plus-cache-clusters-part-1/
> 
> We see this problem on :
> 
> nginx version: nginx/1.19.6
> built by gcc 8.3.0 (Debian 8.3.0-6)
> built with OpenSSL 1.1.1d  10 Sep 2019
> TLS SNI support enabled
> configure arguments: --add-module=/root/incubator-pagespeed-ngx-latest-stable --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_ssl_module --with-http_stub_status_module --with-pcre-jit --with-http_secure_link_module --with-http_v2_module --with-http_realip_module --with-stream_geoip_module --http-scgi-temp-path=/tmp --http-uwsgi-temp-path=/tmp --http-fastcgi-temp-path=/tmp --http-proxy-temp-path=/tmp --http-log-path=/var/log/nginx/access --error-log-path=/var/log/nginx/error --pid-path=/var/run/nginx.pid --conf-path=/etc/nginx/nginx.conf --sbin-path=/usr/sbin --prefix=/usr --with-threads
> 
> Pagespeed is our only third party module and it is version 1.13.35.2-0
> 
> Some nginx process start to spin in a tight loop, strace shows:
> 
> write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0 at y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)
> write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0 at y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)
> write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0 at y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)
> write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0 at y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)
> write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0 at y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)
> write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0 at y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)
> 
> looking in /proc 
> 
> root at ao3-front08:/proc/799697/fd# ls -l 168
> l-wx------ 1 nginx nginx 64 Jan 18 22:05 168 -> 'pipe:[2914414548]'
> 
> root at ao3-front08:/proc# grep 2914414548 /tmp/fds
> lr-x------ 1 nginx nginx 64 Jan 18 22:05 799697/fd/167 -> pipe:[2914414548]
> l-wx------ 1 nginx nginx 64 Jan 18 22:05 799697/fd/168 -> pipe:[2914414548]
> 
> The issue happens more when load is higher. Has anyone some 
> advice as my current hack of killing processes that have used 
> more than 1800 seconds of cpu is wrong.

Are you able to reproduce the problem without any 3rd party 
modules?  Since nginx itself does not use pipes, this looks like a 
pagespeed problem.

-- 
Maxim Dounin
http://mdounin.ru/