(sorry for late reply, I was busy with OCSP stapling and slowly catching up with resulting mail backlog now)
On Tue, Aug 21, 2012 at 11:36:49AM -0700, Anton Jouline wrote:
hello, Nginx developers,
submitting this patch for your review. It changes the behaviour of peer selection for the case, when the upstream is defined implicitly by proxy_pass directive like this:
resolver 172.16.0.23; set $backend backend.servers.example.com; proxy_pass http://%24backend;
It turns out that the round-robin algorithm actually does not work in this case, because of the way things are implemented in src/http/ngx_http_upstream.c. The peer array is created and initialized on each request, even if a cached result of DNS query is returned by resolver. The upstream code has no knowledge of whether the list of IP addresses changed or not. So what ends up happening is this: Let's say "backend.servers.example.com" resolves to 6 IP addresses. Then the first good one will be always used, and the rest will never be even looked at.
Surely this is valid problem, and it needs fixing. Though I don't think we have to introduce another balancer module just for this.
I think of two possible ways to resolve this:
1. Enforce random order of server in ngx_http_upstream_create_round_robin_peer(). Note it's only used to create upsteams on the fly, so it should be ok (and will lead to the same result as with your patch, though will be simplier).
2. Introduce round-robin in resolver code before we return cached response, much like it will be done by a normal DNS servers. This should also simplify other possible uses of a resolver.
Not sure which one would be better though.