Nginx Reverse Proxy - Stale proxy_pass URL

Mon Mar 7 19:53:04 UTC 2022

Greetings nginx,

nginx version: nginx/1.18.0 running an AWS EC2 instance with an Amazon Linux 2 AMI.

Using this nginx.conf for reverse proxy and mutual authentication of some specialized mobile devices.

   server {
     listen                  443 ssl ;
     server_name             serviceapi.company.com;
     root                    /usr/share/nginx/html/....;
     index                   app.php app_dev.php config.php;
     location / {
      proxy_pass https://upstream;
      }

              ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
              ssl_certificate  /etc/pki/nginx/private/...crt;
              ssl_certificate_key /etc/pki/nginx/private/...key;
              ssl_client_certificate /etc/pki/nginx/private/...pem;

      ssl_verify_client           on;
      ssl_prefer_server_ciphers   on;
      ssl_session_cache           shared:SSL:1m;
      ssl_session_timeout         5m;
      ssl_ciphers                 HIGH:!aNULL:!MD5;
      ssl_verify_depth  3;
  }

This works well but has one critical issue. The proxy_pass directive URL (upstream) is an endpoint in AWS Route53 defined by an API gateway that is fronted by an ELB. That is, https://upstream resolves to the IPv4 addresses of an ELB in AWS. The issue is that nginx is only resolving this endpoint when it starts. Let's say:

dig upstream +short
1.2.3.4
1.2.3.5

As long as these two ELB IPs do not change, then device traffic gets proxied to upstream without issue. However if the ELB resource is recreated in AWS and these IPs change:

dig upstream +short
6.7.8.9
6.7.8.10

this causes:

2022/03/04 20:57:21 [error] 18352#0: *30682 connect() failed (111: Connection refused) while connecting to upstream, client: <client_ip>, server: serviceapi.company.com, request: "GET /<path>/pending HTTP/1.1", upstream: "https://1.2.3.4/<path>/pending", host: "<endpoint-used-by-devices>"

The nginx service has cached 1.2.3.4 at runtime and the fact that the https://upstream now resolves to different IPs has broken the proxy. Restarting the nginx service fixes the issue since it then resolves https://upstream to the new ELB IPs.

Question-1

Is there a directive to add to our nginx.conf server block that will force nginx to re-resolve its proxy_pass URL upon error? If not upon error, then perhaps at some configurable time interval?

I have my eye on proxy_cache_use_stale, but not sure if this is suited to our use case.

Question-2

The devices using this setup are specialized and testing is not easy. Is there a command line option that will allow a user with SSH access to the EC2 instance where nginx is running to verify what nginx currently has in its cache for https://upstream? (i.e. rather than having to wait for a real device to error). The access.log does not display this information, only the error.log does.

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20220307/362a5168/attachment.htm>