Keepalived Connections Reset after reloading the configuration (HUP Signal)

Mon Apr 1 17:04:10 UTC 2019

Hello!

On Thu, Mar 28, 2019 at 08:49:48PM -0400, darthhexx wrote:

> Hi,
> 
> We are seeing some fallout from this behaviour on keep-alive connections
> when proxying traffic from remote POPs back to an Origin DC that, due to
> latency, brings about a race condition in the socket shutdown sequence. The
> result being the fateful "upstream prematurely closed connection while
> reading response header from upstream" in the Remote POP.
> 
> A walk through of what we are seeing:
> 
> 1. Config reload happens on the Origin DC.
> 2. Socket shutdowns are sent to all open, but not transacting, keep-alive
> connections.
> 3. Remote POP sends data on a cached connection at around the same time as
> #2, because at this point it has not received the disconnect yet.
> 4. Remote POP then receives the disconnect and errors with "upstream
> prematurely..".
> 
> Ideally we should be able to have the Origin honour the
> `worker_shutdown_timeout` (or some other setting) for keep-alive
> connections. That way we would be able to use the `keepalive_timeout`
> setting for upstreams to ensure the upstream's cached connections always
> time out before a worker is shutdown. Would that be possible or is there
> another way to mitigate this scenario?

As per HTTP RFC, clients are expected to be prepared to such close 
events (https://tools.ietf.org/html/rfc2616#section-8.1.4).  In 
nginx, if an error happens when nginx tries to use a cached 
connection, it automatically tries again as long as it is 
permitted by "proxy_next_upstream" 
(http://nginx.org/r/proxy_next_upstream).

-- 
Maxim Dounin
http://mdounin.ru/