Nginx fails on high load on debian 10 vs no problems on debian 9

janning nginx-forum at forum.nginx.org
Sun Feb 2 15:25:24 UTC 2020


My first post here as we had never any problems with nginx. 
We use 5 nginx server as loadbalancers for our spring boot application. 

We were running them for years on debian 9 with the default nginx package
1.10.3
Now we switched three of our loadbalancers to debian 10 with nginx 1.14.2

First everything runs smoothly. Then, on high load we encountered some
problems. It starts with 

2020/02/01 17:10:55 [crit] 5901#5901: *3325390 SSL_write() failed while
sending to client, client: ...
2020/02/01 17:10:55 [crit] 5901#5901: *3306981 SSL_write() failed while
sending to client, client: ...

In between we get lots of 

2020/02/01 17:11:04 [error] 5902#5902: *3318748 upstream timed out (110:
Connection timed out) while connecting to upstream, ...
2020/02/01 17:11:04 [crit] 5902#5902: *3305656 SSL_write() failed while
sending response to client, client: ...
2020/02/01 17:11:30 [error] 5911#5911: unexpected response for
ocsp.int-x3.letsencrypt.org

It ends with
2020/02/01 17:11:33 [error] 5952#5952: unexpected response for
ocsp.int-x3.letsencrypt.org

The problem does only exits for 30-120 seconds on high load and disappears
afterwards.

In the kernel log we have sometimes:
Feb  1 17:11:04 kt104 kernel: [1033003.285044] TCP: request_sock_TCP:
Possible SYN flooding on port 443. Sending cookies.  Check SNMP counters.

But on other occasions we don't see any kernel.log messages

On both debian 9 and debian 10 servers we did some identically TCP Tuning

# Kernel tuning settings
# https://www.nginx.com/blog/tuning-nginx/
net.core.rmem_max=26214400
net.core.wmem_max=26214400
net.ipv4.tcp_rmem=4096 524288 26214400
net.ipv4.tcp_wmem=4096 524288 26214400
net.core.somaxconn=1000
net.core.netdev_max_backlog=5000
net.ipv4.tcp_max_syn_backlog=10000
net.ipv4.ip_local_port_range=16000 61000
net.ipv4.tcp_max_tw_buckets=2000000
net.ipv4.tcp_fin_timeout=30
net.core.optmem_max=20480

The nginx config is exactly the same, so I just show some important parts:

user www-data;
worker_processes auto;
worker_rlimit_nofile 50000;
pid /run/nginx.pid;

events {
        worker_connections 5000;
        multi_accept on;
        use epoll;
}

http {
	root /var/www/loadbalancer;

        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        types_hash_max_size 2048;
        server_tokens off;
        client_max_body_size 5m;

        client_header_timeout 20s; # default 60s
        client_body_timeout 20s; # default 60s
        send_timeout 20s; # default 60s

        include /etc/nginx/mime.types;
        default_type application/octet-stream;

        ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
    	ssl_session_timeout 1d;
 	ssl_session_cache   shared:SSL:100m;
        ssl_buffer_size 4k;
	ssl_dhparam /etc/nginx/dhparam.pem;
        ssl_prefer_server_ciphers on;
        ssl_ciphers
'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS';

	ssl_session_tickets on;
	ssl_session_ticket_key /etc/nginx/ssl_session_ticket.key;
	ssl_session_ticket_key /etc/nginx/ssl_session_ticket_old.key;

        ssl_stapling on;
        ssl_stapling_verify on;
	ssl_trusted_certificate /etc/ssl/rapidssl/intermediate-root.pem;

	resolver 8.8.8.8;
	    
        log_format custom '$host $server_port $request_time
$upstream_response_time $remote_addr '
                          '"$http2" "$ssl_session_reused" $upstream_addr
$time_iso8601 '
                          '"$request" $status $body_bytes_sent
"$http_referer" "$http_user_agent"';

        access_log /var/log/nginx/access.log custom;
        error_log /var/log/nginx/error.log;

	proxy_set_header Host $http_host;
	proxy_set_header X-Real-IP $remote_addr;
	proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
	proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_path /var/cache/nginx/ levels=1:2
keys_zone=imagecache:10m inactive=7d use_temp_path=off;
        proxy_connect_timeout 10s;
        proxy_read_timeout 20s;
	proxy_send_timeout 20s;
	proxy_next_upstream off;
	
	map $http_user_agent $outdated {
    	    default                                 0;
	    "~MSIE [1-6]\."                         1;
	    "~Mozilla.*Firefox/[1-9]\."             1;
    	    "~Opera.*Version/[0-9]\."               1;
    	    "~Chrome/[0-9]\."                       1;
        }

	include sites/*.conf;
}


The upstream timeout signals some problems with our java machines. But at
the same time the debian9 nginx/loadbalancer is running fine and has no
problems connecting to any of the upstream servers. 
And the problems with letsencrypt and SSL_write are signaling to me some
problems with nginx or TCP or whatever.
I really don't know how to debug this situation. But we can reliable
reproduce it most of the times we encounter high load on debian10 servers
and did never see it on debian 9. 

Then I installed the stable version nginx 1.16 on debian10 to see if this is
a bug in nginx which is already fixed:

nginx version: nginx/1.16.1
built by gcc 8.3.0 (Debian 8.3.0-6) 
built with OpenSSL 1.1.1c  28 May 2019 (running with OpenSSL 1.1.1d  10 Sep
2019)
TLS SNI support enabled
configure arguments: ...

But it didn't help.

Can somebody help me and give me some hints how to start further debugging
of this situation?

regards
Janning

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,286893,286893#msg-286893



More information about the nginx mailing list