Re-balancing Upstreams in TCP Loadbalancer

Balaji Viswanathan balaji.viswanathan at gmail.com
Thu Sep 15 06:13:51 UTC 2016


Hello Nginx Users,

I am running nginx as a TCP load balancer. I am trying to find a way to
redistribute client TCP connections to upstream servers, specifically,
rebalance the load on the upstream servers (on some event) when clients are
using persistent TCP connections.

The scenario is as follows

Application protocol -  Clients and Servers use a stateful application
protocol on top of TCP which is resilient to TCP disconnections. ie., the
client and server do application level acks and so, if some 'unit' of work
is not completely transferred. it will get retransfered by the client.

Persistent TCP connections - . The client opens TCP connections which are
persistent. With few bytes being transferred intermittently. Getting the
latest data quickly is of importance, hence i would like to avoid frequent
(re)connections (both due to connection setup overhead and varying resource
usage). Typical connection last for days.

Maintenance/Downtime - When one of the upstream servers is shutdown for
maintenance, all it's client connections break, clients reconnect and
switch to one of the remaining active upstream servers. When the upstream
is brought back up post maintenance, the load isnt redistributed. ie.,
existing connections (since they are persistent) remain with other servers.
Only new connections can go to the new server. This is more pronounced in 2
upstream server setup...where all connections switch between
servers....kind of like thundering herd problem.

I would like to have the ability to terminate some/all client connections
explicitly and have them reconnect back. I understand that with nginx
maintaining 2 connections for every client, there might not be a 'clean'
time to close the connection, but since there is an application ack on
top...an unclean termination is acceptable. I currently have to restart
nginx to rebalance the upstreams  which effectively is the same.

Restarting all upstream servers and synchronizing their startup is
non-trivial. So is signalling all clients(1000s) to close and reconnect. In
Nginx, i can achieve this partially by disabling keepalive on nginx listen
port (so_keepalive=off) and then having least_conn as the load-balancer
method on my upstream. However, this is not desirable in steady state (see
persistent TCP connections above), and even though connections get evenly
distributed...the load might no be...as idle and busy clients will end up
with different upstreams.

Nginx plus features like,  "On the fly configuration" upstream_conf allows
one to change the upstream configuration, but it doesnt affect existing
connections, even if a server is marked as down. "Draining of sessions" is
only applicable to http requests and not to TCP connections.

Did anyone else face such a problem? How did you resolve it? Any pointers
will be much appreciated.

thanks,
balaji

-- 
--
Balaji Viswanathan
Bangalore
India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20160915/ec9c107f/attachment.html>


More information about the nginx mailing list