Help with UDP load balancing passive health checks

Sergey Kandaurov pluknet at nginx.com
Thu Feb 24 08:30:03 UTC 2022


> On 23 Feb 2022, at 06:45, Pawel Fraczek <fraczekp at gmail.com> wrote:
> 
> Hi, I'm trying to building a syslog load balancer and I'm running into issues with the failover of UDP messages. TCP works just fine, when the server goes down, all messages failover to the active server. But with UDP, that does not happen. Maybe someone can point me to what I'm doing wrong. Below is the config.
> upstream syssrv {
>  server 
> 192.168.167.108:5500
>  max_fails=2 fail_timeout=15s;
>  server 
> 192.168.167.109:5500
>  max_fails=2 fail_timeout=15s;
>  }
>  server {
>  listen 5500;
>  proxy_protocol on;
>  proxy_pass syssrv;
>  proxy_timeout 1s;
>  proxy_connect_timeout 1s;
>  }
>  server {
>  listen 5500 udp;
>  proxy_pass syssrv;
>  proxy_timeout 1s;
>  proxy_connect_timeout 1s;
>  proxy_bind $remote_addr transparent;
>  }
> }
> 
> I have a script that enumerates each message (n) like this "Testing -proto: udp - n"
> I see both servers getting the message when they are online (even - odd numbers) but when one goes down, once server continues to only get the even numbers, so I'm losing 50% of the messages.
> I tried to debug the setup and I see nginx marking that the udp packets timed out. I see this:
> 2022/02/22 20:05:13 [info] 21362#21362: *777 udp client 192.168.167.101:51529 connected to 0.0.0.0:5500
> 
> 2022/02/22 20:05:13 [info] 21362#21362: *777 udp proxy 
> 192.168.167.101:34912 connected to 192.168.167.108:5500
> 
> 2022/02/22 20:05:13 [info] 21362#21362: *779 udp client 
> 192.168.167.101:53862 connected to 0.0.0.0:5500
> 
> 2022/02/22 20:05:13 [info] 21362#21362: *779 udp proxy 
> 192.168.167.101:35506 connected to 192.168.167.109:5500
> Then this:
> 2022/02/22 20:05:14 [info] 21362#21362: *771 udp timed out, packets from/to client:1/0, bytes from/to client:145/0, bytes from/to upstream:0/145
> 
> But, it's not redirecting the connection to the healthy server. This seems pretty simple but any ideas what I'm doing wrong? It would seem that the non-commercial version should be able to do this, no?
> Any help is appreciated. I also tried to add a backup, but it doesn't work with UDP

The stream module has no notion of the application protocol,
hence it only switches to next upstream on connect() errors.

Due to the nature of the UDP protocol, which is essentially
connectionless, usually it cannot be reported as connect()
failure if the peer is down.  In this case, it is only seen
as connection timeout or recv() error while reading back
from upstream.  This means no next upstream logic for UDP.

The waiting time can be shortened, if the peer reports
back with the ICMP error, such as "port unreachable".
In this case, it is seen as recv() error immediately,
without waiting for connection timeout.

Any way, the peer is marked as failed, that is, it is switched
off temporarily for subsequent connections until "fail_timeout".
This is logged as "upstream server temporarily disabled"
on the [warn] logging level.

-- 
Sergey Kandaurov



More information about the nginx mailing list