hello, Nginx developers,
submitting this patch for your review. It changes the behaviour of peer selection for the case, when the upstream is defined implicitly by proxy_pass directive like this:
resolver 172.16.0.23; set $backend backend.servers.example.com; proxy_pass http://%24backend;
It's important to note that this is the case when "backend.servers.example.com" name is getting resolved by nginx resolver at request time, instead of being resolved by OS resolver when server configuration is read at startup. My change only affects this very specific scenario. (When "backend.servers.example.com" is resolved at startup, the original round-robin algorithm is still used.)
Why the change?
Let me explain our use case first. We are using nginx as a load balancer in from of an array of backend servers. The array is dynamic - it changes over time: new servers can be launched and join the array, old ones may get terminated. The current state of the array is defined by a DNS A-record which maps "backend.servers.example.com" to a list of IP addresses. This A-record also changes dynamically as the array itself changes.
The fact that the A-record is dynamic means that we cannot resolve the name at startup. Therefore, we use nginx resolver to do DNS at request time. This works great, but the problem is with selection of which backend server (peer) to proxy the request to.
It turns out that the round-robin algorithm actually does not work in this case, because of the way things are implemented in src/http/ngx_http_upstream.c. The peer array is created and initialized on each request, even if a cached result of DNS query is returned by resolver. The upstream code has no knowledge of whether the list of IP addresses changed or not. So what ends up happening is this: Let's say "backend.servers.example.com" resolves to 6 IP addresses. Then the first good one will be always used, and the rest will never be even looked at.
It is easy to verify that, if you use nginx.debug executable, enable "debug" logging in your error_log directive, and just tail the error log like this:
$tail /var/log/nginx/error.log | grep current 2012/08/21 12:07:09 [debug] 25252#0: *1 get rr peer, current: 0 -5 2012/08/21 12:07:10 [debug] 25252#0: *1 get rr peer, current: 0 -5 2012/08/21 12:07:11 [debug] 25252#0: *1 get rr peer, current: 0 -5 2012/08/21 12:07:12 [debug] 25252#0: *1 get rr peer, current: 0 -5 2012/08/21 12:07:12 [debug] 25252#0: *1 get rr peer, current: 0 -5 2012/08/21 12:07:13 [debug] 25252#0: *1 get rr peer, current: 0 -5
With my random selection change, the peer is basically chosen at random each time. Of course, the random selection does not guarantee ideal distribution of the load, but it turns out to work quite well statistically. Here are some numbers from a load-testing run. 6 servers used were in the backend:
Total # of requests: 2,374,398 ====================== server 2: 399,342 16.819% server 3: 397,854 16.756% server 1: 396,807 16.712% server 4: 396,660 16.706% server 5: 393,735 16.582% server 0: 390,000 16.425%
In our case, we don't really care if all backend servers receive the exact amount of requests, as long as the load is approximately the same. The bad situation that we were aiming to avoid, is when one server is overloaded and others are running idle.
The patch was originally made for 1.2.2, but i verified that it also applies without problems to 1.2.3 and 1.3.5, and works the same way with those newer versions of nginx.
Sorry about such a long message. Thanks for reading!