Nginx - 56 day old reverse-proxy suddenly unable to connect upstream.

bdarbro nginx-forum at
Fri Feb 21 21:19:46 UTC 2020

I have nginx configured as a reverse proxy to Amazon's AWS IoT MQTT service.
 This was functioning well for almost 2 months, when suddenly 20 out of 32
instances of this stopped being able to connect upstream.  We started seeing
sporadic upstream SSL connection errors, followed by sporadic upstream
connection refused, and then finally, mostly connection timeouts to
upstream.  Nothing short of a restart or reload of Nginx fixes this.  Debug
logging is not enabled, and trying to enable it replaces the worker
processes, and effectively ends the issue.  Over the next 3 days, the
remaining nodes started exhibiting this problem as well.  Rather than
restarting nginx on these remaining nodes, I isolated them for study, and
stood up new nodes to replace them.

But in studying these, I cannot find any indicator as to why this is
happening.  Now that these have been removed from client traffic, and I can
test with curl's...  I can hit one of these 5 times, and by the 5th call, I
get a repro.  Connection timeout to the upstream, resulting in a timeout to

Here is the version information for nginx, as it comes from Ubuntu 18.04:
nginx version: nginx/1.14.0 (Ubuntu)
built with OpenSSL 1.1.1  11 Sep 2018
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2
-fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time
-D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro
-Wl,-z,now -fPIC' --prefix=/usr/share/nginx
--conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log
--error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock
--pid-path=/run/ --modules-path=/usr/lib/nginx/modules
--http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit
--with-http_ssl_module --with-http_stub_status_module
--with-http_realip_module --with-http_auth_request_module
--with-http_v2_module --with-http_dav_module --with-http_slice_module
--with-threads --with-http_addition_module --with-http_geoip_module=dynamic
--with-http_gunzip_module --with-http_gzip_static_module
--with-http_image_filter_module=dynamic --with-http_sub_module
--with-http_xslt_module=dynamic --with-stream=dynamic
--with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module

user www-data;
worker_processes auto;
pid /run/;
include /etc/nginx/modules-enabled/*.conf;
worker_rlimit_nofile 30500;

events {
	worker_connections 10000;
	# multi_accept on;

http {
	sendfile on;
	tcp_nopush on;
	tcp_nodelay on;
	keepalive_timeout 65;
	types_hash_max_size 2048;

	include /etc/nginx/mime.types;
	default_type application/octet-stream;

    #IPV6 also disabled via kernel boot option and sysctl, too.
    #Couldn't get nginx to stop AAAA lookups without doing that.
    resolver valid=3s ipv6=off;
    resolver_timeout 10;
    # enable reverse proxy
    proxy_redirect              off;
    proxy_set_header            Host  ;
    proxy_set_header            X-Real-IP       $remote_addr;
    proxy_set_header            X-Forwared-For  $proxy_add_x_forwarded_for;

	ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
	ssl_prefer_server_ciphers on;

	access_log /var/log/nginx/access.log;
	error_log /var/log/nginx/error.log error;

	gzip on;

	# Nginx-lua-prometheus
	# Prometheus metric library for Nginx
	lua_shared_dict prometheus_metrics 10M;
	lua_package_path "/etc/nginx/nginx-lua-prometheus/?.lua";
	init_by_lua '
	  prometheus = require("prometheus").init("prometheus_metrics")
	  metric_requests = prometheus:counter(
	    "nginx_http_requests_total", "Number of HTTP requests", {"host",
	  metric_latency = prometheus:histogram(
	    "nginx_http_request_duration_seconds", "HTTP request latency",
	  metric_connections = prometheus:gauge(
	    "nginx_http_connections", "Number of HTTP connections", {"state"})
	log_by_lua '
	  metric_requests:inc(1, {ngx.var.server_name, ngx.var.status})

	include /etc/nginx/conf.d/*.conf;
	include /etc/nginx/sites-enabled/*;

iot-proxy config file:
    # Define group of backend / upstream servers:
    upstream iot-backend

        #listen      443 default ssl;
        listen      443 ssl;

        ssl_session_cache    shared:SSL:1m;
        ssl_session_timeout  86400;
        ssl_certificate /etc/nginx/ssl/CENSORED.crt;
        ssl_certificate_key /etc/nginx/ssl/CENSORED.key;
        ssl_verify_client off;
        ssl_protocols        SSLv3 TLSv1 TLSv1.1 TLSv1.2;
        ssl_ciphers RC4:HIGH:!aNULL:!MD5;
        ssl_prefer_server_ciphers on;

        location /
            proxy_pass  https://iot-backend;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_set_header Host "";
            proxy_read_timeout 86400;
            proxy_ssl_session_reuse off;

nginx-lua-prometheus config file:
server {
  listen 9145;
  deny all;
  location /metrics {
    content_by_lua '
      metric_connections:set(ngx.var.connections_reading, {"reading"})
      metric_connections:set(ngx.var.connections_waiting, {"waiting"})
      metric_connections:set(ngx.var.connections_writing, {"writing"})

Posted at Nginx Forum:,287081,287081#msg-287081

More information about the nginx mailing list