feature request: warn when domain name resolves to several addresses

Wed Nov 20 19:28:31 UTC 2019

Hello!

On Tue, Nov 19, 2019 at 07:26:35PM -0700, Roger Pack wrote:

> On Tue, Nov 19, 2019 at 12:01 PM Maxim Dounin <mdounin at mdounin.ru> wrote:
> 
> > On Tue, Nov 19, 2019 at 10:47:01AM -0700, Roger Pack wrote:
> >
> > > I noticed that in ngx_http_proxy_module
> > >
> > > proxy_pass http://localhost:8000/uri/;
> > > "If a domain name resolves to several addresses, all of them will be
> > > used in a round-robin fashion. In addition, an address can be
> > > specified as a server group."
> > >
> > > However this can be confusing for end users who innocently put the
> > > domain name "localhost" then find that round-robin across ipv6 and
> > > ipv4 is occurring, ref:
> > > https://stackoverflow.com/a/58924751/32453
> >
> > This seems to be your own answer, and it looks incorrect to me.
> > In particular, the 499 error is logged when the client closes
> > connection, and there is no need to have more than one backend
> > server specified to see 499 errors.
> 
> True, those cases were covered in some other answers to that question,
> but I'll add a note. :)
> It can also be logged when the backend server times out, at least
> empirically that seems to be the case...
> see also https://serverfault.com/questions/523340/post-request-is-repeated-with-nginx-loadbalanced-server-status-499/783624#783624

It is logged when the client closes the connection, only.  But 
reasons why the client closes the connect might be different.

In particular, when the backend server times out, it means that 
processing the request takes a long time.  And if processing 
takes time, it is likely that the client will give up waiting and 
will close the connection, resulting in 499.

> > > https://stackoverflow.com/a/52550758/32453
> >
> > Changing "localhost" to "127.0.0.1" here "works" because having just
> > one address triggers slightly different logic in the upstream
> > code: with just one address, max_fails / fail_timeout logic is
> > disabled, and nginx always uses the (only) address available, even
> > if there are errors.
> 
> Right.  The confusion in my mind is that people configuring Nginx will
> use one backend "localhost", and assume they have set it up for a
> "single server" type server group.
> Since they have listed only one host.  But it has not...
> See for instance https://stackoverflow.com/a/52550758
> 
> > The underlying problem is still the same though: backends cannot
> > cope with the load, and there are errors.
> 
> Right.  However with the "single server" scenario this behavior is
> handled differently (it doesn't exhaust the server group of available
> servers and begin to return with 502's exclusively for a time, as it
> did in my instance...).
> 
> Basically if, while setting it up, you happen to forward to 127.0.0.1,
> it will work fine, no "periods of 502's" (though you may get some
> 504's).
> 
> But if you forward it to "localhost" you may be surprised one day to
> discover that you are getting "periods of 502's" if any connections
> timeout (> 60s) for any reason.  Since only 2 of those and your entire
> server group has been exhausted.

I don't think people know and/or expect the difference in handling 
between single address and multiple addresses, regardless of 
whether they know there are multiple addresses, or not.  As such, 
a configuration-time warning won't help.

Rather, we can consider explaining the difference.  Alternatively, 
we can make it go away - either by changing the single-address case 
to be identical to the multiple-addresses one, or vice versa.  Or even 
by making this configurable.

(Actually, previously multiple-addresses case was handled 
differently, closer to the single-address approach, and resulted 
in just one 502, with "quick recovery" of all servers on the first 
request.  But some time ago this was changed to follow 
fail_timeout instead, as quick recovery of all servers seems to 
cause more harm than good in most configurations.)

> > (And no, it's not a DNS failure - DNS is only used when nginx
> > resolves the name in the proxy_pass directive while parsing
> > configuration on startup.)
> >
> > > Suggestion/feature request: If a domain name resolves to several
> > > addresses, log a warning in error.log file somehow, or at least in the
> > > output of -T, to warn  somehow.  Then there won't be unexpected
> > > round-robins occurring and "supposedly single" servers being
> > > considered unavailable due to timeouts, surprising people like myself.
> >
> > Multiple addresses are fairy normal, and I don't think that
> > logging a warning is a good idea.
> 
> I'm just saying...it might help somebody like me out, in the future.
> There be dragons...or maybe the default error log could be configured
> to make it more obvious to people what is going on?
> (https://stackoverflow.com/a/52550758)

>From the error log things are expected to be pretty obvious - 
nginx logs the original errors, and it also logs when it cannot 
pick an upstream server to use ("no live upstreams", which means 
"all upstream servers are disabled due to errors").  Further, it 
also logs when it disables a server, though it happens on the 
"warn" level.

The main problem is that people hardly look into error logs at 
all.  For example, the answer you are referring to only provides 
access log information, and this is what makes it confusing.  On 
the other hand, another answer to the same question is based on 
the "no live upstreams" error message from the question, and 
correctly refers to the max_fails/fail_timeout parameters.

-- 
Maxim Dounin
http://mdounin.ru/