High CPU usage with RHEL9 and nginx at 20Gbps

Mon Nov 21 12:03:25 UTC 2022

Hi,

i have a problem with worker high CPU usage on recent Rocky Linux 9 with
nginx setup.

I have 10 identical servers, each with Mellanox ConnectX-5 dual 25Gbps
NIC, 6x Samsung PM983 NVMe drives, 512GB RAM and Epyc 7402 CPU.

9 of them are using RockyLinux 8.7 (some with stock, some with
5.18.10-1.el8.elrepo.x86_64 kernel) with nginx and passing ~20Gbps of
SSL/http2 traffic using 24workers and reuseport config option. Each
worker have about 10-12% CPU usage and nginx worker strace looks like
this (normal):

strace: Process 73930 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 72.59    0.018036           1      9329       195 write
 15.61    0.003879           9       400           io_submit
  5.70    0.001416           2       681           epoll_wait
  2.93    0.000729           1       569        88 read
  1.81    0.000449           1       394           io_getevents
  0.81    0.000202           3        51           sendto
  0.52    0.000129           1       106           fcntl
  0.02    0.000005           5         1           shutdown
  0.01    0.000002           1         2           epoll_ctl
  0.00    0.000000           0         5           getpid
  0.00    0.000000           0         1           recvfrom
  0.00    0.000000           0         1           setsockopt
  0.00    0.000000           0         1           accept4
------ ----------- ----------- --------- --------- ----------------
100.00    0.024847           2     11541       283 total

On 10th server we are now testing Rocky Linux 9 configuration and each
worker is using about 60-80% of CPU with following strace from nginx
worker:

strace: Process 1966 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 88.44    0.461237        1130       408           io_submit
 10.68    0.055674           2     18658       147 write
  0.25    0.001283           1       655       191 read
  0.20    0.001033           2       435           epoll_wait
  0.18    0.000914           5       156           fcntl
  0.08    0.000420           1       306           futex
  0.07    0.000378           5        65           sendto
  0.07    0.000348           1       271           io_getevents
  0.02    0.000085           9         9           openat
  0.01    0.000037          12         3           accept4
  0.01    0.000034           8         4           close
  0.00    0.000023           2         9           newfstatat
  0.00    0.000021           0        23           getpid
  0.00    0.000015           2         6           setsockopt
  0.00    0.000008           1         6           epoll_ctl
  0.00    0.000006           2         3           recvfrom
------ ----------- ----------- --------- --------- ----------------
100.00    0.521516          24     21017       338 total

As you can see, it is spending a lot of time in io_submit. It has 1130
usec/call while RL8 servers have only 9 usec/call in io_submit.

Only thing that has changed is RL8->RL9. All servers have identical
hardware. All servers are serving almost identical amount of traffic
~19-20Gbps from identical files with identical number of connections.
Iostat on all servers shows about 50% util on each drive, similar
rKB/s, r/s and rareq-sz so disk IO should not be a problem.

nginx-1.22.1 compiled with:
./configure --prefix=/usr/local/nginx --with-http_mp4_module
--with-http_secure_link_
module  --with-http_stub_status_module
--with-http_ssl_module --with-http_v2_module --with-pcre
--with-file-aio --with-threads --with-cc-opt=' -DTCP_FASTOPEN=23'
--with-http_sub_module
Important parts of nginx.conf:
    worker_processes  24;
    worker_cpu_affinity auto;
    worker_rlimit_nofile 81920;
    events {
        worker_connections   2000;
        use epoll;
    }

    sendfile off;
    aio on;
    directio 4096;
    directio_alignment 4k;
    tcp_nodelay      on;

Any ideas what could be causing the problem?