Nginx reverse proxy crash when dns unavailable

Fri Oct 23 04:13:30 MSD 2009

Hello!

On Thu, Oct 22, 2009 at 03:53:32PM -0400, masom wrote:

> But shoulnd't nginx start anyway if the end point is not responding and just try to reach it anyway?
>
> I can't really see why it would need to stop or crash when either the endpoint (apache) or the dns system is unavailable.
> 
> Yes it should display 5xx errors saying the endpoint is unreachable (dns or server failure / not-responding) but nginx should not "lock up" after 1 bad answer.
> 
> Current problem:
> 
> unit starts
> dhcp kicks in
> nginx get started before dhcp process is completed
> nginx realize that content.dev.local is not reachable (dns settings are not yet set by dhcp)
> nginx exits
> Browser on unit starts, says address is unreachable (as nginx did not start).
> 
> 
> Shouldn't nginx just attempt to connect to the end point as requests are coming in?

Probably I'm not explained well enough.

When nginx have something it may attempt to connect to - it will 
happily work.  But in case of failed name resolution during 
configuration parsing it just don't have an ip.

When you write in the config something like

    location /pass-to-backend/ {
        proxy_pass http://backend;
    }

hostname "backend" is resolved during config parsing via standard 
function gethostbyname().  This function is blocking and therefore 
can't be used during request processing in nginx workers as it 
will block all clients for unknown period of time.  So this 
function is only used during config parsing, hostname "backend" 
resolved to ip address[es], and later during request processing 
this ip is used without further DNS lookups.

If "backend" can't be resolved during config parsing there are 
basically two options:

1. Work as is, always returning 502 when user tries to access uri 
that should be proxied.  We have no ip to connect() to, remember?

2. Refuse to start, assuming administrator will fix the problem 
and start us normally.

Option (1) probably better in situations where you have 
improperly configured system without any reliability implemented 
that have to start unattended at any cost and do at least 
something.

But it's not really wise to do (1) in normal situation.  It will 
basically start service in broken and almost undetectable state.  
Consider it's the part of big cluster - new node comes up, seems 
to work.  But for some requests it returns errors for no reason.  
It's administrative nightmare.

On the other hand, during reconfiguration, configuration testing, 
binary upgrade and other attended operations the only sensible 
thing to do is certanly (2).  You wrote hostname in config that 
can't be resolved - it's just configuration error.

Note well: note that there is quite a different mode of proxy_pass 
operation, proxy_pass with variables, which may use nginx's 
internal async resolver.  For this mode nginx won't try to 
resolve hostnames during configuration parsing, and nginx will 
start perfectly even when dns isn't available.  But this

a) requires additional configuration (you have to configure ip of 
your DNS server via resolver directive);

b) much more resource consuming;

c) internal nginx resolver known to have problems at least in 
stable branch.

Therefore I can't recommend using it in production.

> The solution we consider is the hosts file that would always point to a static ip for the content server, but would be a little management problem as we are deploying in several different location with different networks.

I don't really understand why not just impose correct 
prerequisites before starting nginx.  It's not really hard to wait 
before network comes up.

Maxim Dounin