improve the first selection of SWRR algorithm
Maxim Dounin
mdounin at mdounin.ru
Tue Nov 10 16:41:04 UTC 2020
Hello!
On Wed, Nov 04, 2020 at 12:58:51PM +0000, 陈洁 Cjhust Chen wrote:
> Hi:
> We improve the Smooth Weighted Round-Robin(SWRR)
> algorithm to successfully resolve the problem in the
> following situations.
>
> Situation 1:
> upstream backend-server {
> server 1.1.1.1:8000 weight=100;
> server 2.2.2.2:8000 weight=101;
> server 3.3.3.3:8000 weight=100;
> }
>
> 1. When each machine in the cluster mode executes "-s reload" at
> the same time , the first selection of each machine is the
> machine 2.2.2.2:8000 having higher weight , which will lead to
> 300%+ increase of 2.2.2.2:8000 traffic.
> 2. More and more companies are implementing service discovery
> based on nginx. Adding or removing machine will also lead to
> 300%+ increase of 2.2.2.2:8000 traffic.
>
>
>
> Situation 2:
> upstream backend-server {
> server 1.1.1.1:8000 weight=100;
> server 2.2.2.2:8000 weight=100;
> server 3.3.3.3:8000 weight=100;
> }
>
> 1. When each machine in the cluster mode executes "-s reload" at
> the same time , the first selection of each machine is the first
> machine 1.1.1.1:8000, which will lead to 300%+ increase of
> 1.1.1.1:8000 traffic.
> 2. More and more companies are implementing service discovery
> based on nginx. Adding or removing machine will also lead to
> 300%+ increase of 1.1.1.1:8000 traffic.
>
>
>
>
>
> # HG changeset patch
> # User Jie Chen <cherrychenjie at didiglobal.com<mailto:cherrychenjie at didiglobal.com>>
> # Date 1599813602 -28800
> # Fri Sep 11 16:40:02 2020 +0800
> # Node ID 931b0c055626657d68f886781c193ffb09245a2e
> # Parent da5e3f5b16733167142b599b6af3ce9469a07d52
> improve the first selection of SWRR algorithm
>
> diff -r da5e3f5b1673 -r 931b0c055626 src/http/ngx_http_upstream_round_robin.c
> --- a/src/http/ngx_http_upstream_round_robin.c Wed Sep 02 23:13:36 2020 +0300
> +++ b/src/http/ngx_http_upstream_round_robin.c Fri Sep 11 16:40:02 2020 +0800
> @@ -91,7 +91,7 @@
> peer[n].name = server[i].addrs[j].name;
> peer[n].weight = server[i].weight;
> peer[n].effective_weight = server[i].weight;
> - peer[n].current_weight = 0;
> + peer[n].current_weight = 0 - ngx_random() % peers->total_weight;
> peer[n].max_conns = server[i].max_conns;
> peer[n].max_fails = server[i].max_fails;
> peer[n].fail_timeout = server[i].fail_timeout;
> @@ -155,7 +155,7 @@
> peer[n].name = server[i].addrs[j].name;
> peer[n].weight = server[i].weight;
> peer[n].effective_weight = server[i].weight;
> - peer[n].current_weight = 0;
> + peer[n].current_weight = 0 - ngx_random() % peers->total_weight;
> peer[n].max_conns = server[i].max_conns;
> peer[n].max_fails = server[i].max_fails;
> peer[n].fail_timeout = server[i].fail_timeout;
>
>
Thank you for your patch.
In no particular order:
- Traffic on a particular server is not expected to be noticeably
increased after nginx restart / configuration reload unless
there are very few requests.
- Further, given that a reload happens at some random time, adding
another random is not going to help. That is, the patch seems
to only improve things if nginx is reloaded after a small non-random
amount of requests.
- Using "peers->total_weight" for backup peers is wrong.
- Using the same current_weight for all worker processes is
essentially the same problem as the one you are trying to solve.
- The patch breaks the "sum of all current weights is 0"
invariant. This is not fatal, yet complicates things for no
obvious reasons.
- In general, it might be a better idea to use the random balancer
if you are indeed facing the problems described
(http://nginx.org/r/random).
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx-devel
mailing list