High CPU usage with RHEL9 and nginx at 20Gbps
Marcin Wanat
marcin.wanat at gmail.com
Mon Nov 21 12:03:25 UTC 2022
Hi,
i have a problem with worker high CPU usage on recent Rocky Linux 9 with
nginx setup.
I have 10 identical servers, each with Mellanox ConnectX-5 dual 25Gbps
NIC, 6x Samsung PM983 NVMe drives, 512GB RAM and Epyc 7402 CPU.
9 of them are using RockyLinux 8.7 (some with stock, some with
5.18.10-1.el8.elrepo.x86_64 kernel) with nginx and passing ~20Gbps of
SSL/http2 traffic using 24workers and reuseport config option. Each
worker have about 10-12% CPU usage and nginx worker strace looks like
this (normal):
strace: Process 73930 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
72.59 0.018036 1 9329 195 write
15.61 0.003879 9 400 io_submit
5.70 0.001416 2 681 epoll_wait
2.93 0.000729 1 569 88 read
1.81 0.000449 1 394 io_getevents
0.81 0.000202 3 51 sendto
0.52 0.000129 1 106 fcntl
0.02 0.000005 5 1 shutdown
0.01 0.000002 1 2 epoll_ctl
0.00 0.000000 0 5 getpid
0.00 0.000000 0 1 recvfrom
0.00 0.000000 0 1 setsockopt
0.00 0.000000 0 1 accept4
------ ----------- ----------- --------- --------- ----------------
100.00 0.024847 2 11541 283 total
On 10th server we are now testing Rocky Linux 9 configuration and each
worker is using about 60-80% of CPU with following strace from nginx
worker:
strace: Process 1966 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
88.44 0.461237 1130 408 io_submit
10.68 0.055674 2 18658 147 write
0.25 0.001283 1 655 191 read
0.20 0.001033 2 435 epoll_wait
0.18 0.000914 5 156 fcntl
0.08 0.000420 1 306 futex
0.07 0.000378 5 65 sendto
0.07 0.000348 1 271 io_getevents
0.02 0.000085 9 9 openat
0.01 0.000037 12 3 accept4
0.01 0.000034 8 4 close
0.00 0.000023 2 9 newfstatat
0.00 0.000021 0 23 getpid
0.00 0.000015 2 6 setsockopt
0.00 0.000008 1 6 epoll_ctl
0.00 0.000006 2 3 recvfrom
------ ----------- ----------- --------- --------- ----------------
100.00 0.521516 24 21017 338 total
As you can see, it is spending a lot of time in io_submit. It has 1130
usec/call while RL8 servers have only 9 usec/call in io_submit.
Only thing that has changed is RL8->RL9. All servers have identical
hardware. All servers are serving almost identical amount of traffic
~19-20Gbps from identical files with identical number of connections.
Iostat on all servers shows about 50% util on each drive, similar
rKB/s, r/s and rareq-sz so disk IO should not be a problem.
nginx-1.22.1 compiled with:
./configure --prefix=/usr/local/nginx --with-http_mp4_module
--with-http_secure_link_
module --with-http_stub_status_module
--with-http_ssl_module --with-http_v2_module --with-pcre
--with-file-aio --with-threads --with-cc-opt=' -DTCP_FASTOPEN=23'
--with-http_sub_module
Important parts of nginx.conf:
worker_processes 24;
worker_cpu_affinity auto;
worker_rlimit_nofile 81920;
events {
worker_connections 2000;
use epoll;
}
sendfile off;
aio on;
directio 4096;
directio_alignment 4k;
tcp_nodelay on;
Any ideas what could be causing the problem?
More information about the nginx
mailing list