Keepalived Connections Reset after reloading the configuration (HUP Signal)
mdounin at mdounin.ru
Mon Apr 1 17:04:10 UTC 2019
On Thu, Mar 28, 2019 at 08:49:48PM -0400, darthhexx wrote:
> We are seeing some fallout from this behaviour on keep-alive connections
> when proxying traffic from remote POPs back to an Origin DC that, due to
> latency, brings about a race condition in the socket shutdown sequence. The
> result being the fateful "upstream prematurely closed connection while
> reading response header from upstream" in the Remote POP.
> A walk through of what we are seeing:
> 1. Config reload happens on the Origin DC.
> 2. Socket shutdowns are sent to all open, but not transacting, keep-alive
> 3. Remote POP sends data on a cached connection at around the same time as
> #2, because at this point it has not received the disconnect yet.
> 4. Remote POP then receives the disconnect and errors with "upstream
> Ideally we should be able to have the Origin honour the
> `worker_shutdown_timeout` (or some other setting) for keep-alive
> connections. That way we would be able to use the `keepalive_timeout`
> setting for upstreams to ensure the upstream's cached connections always
> time out before a worker is shutdown. Would that be possible or is there
> another way to mitigate this scenario?
As per HTTP RFC, clients are expected to be prepared to such close
events (https://tools.ietf.org/html/rfc2616#section-8.1.4). In
nginx, if an error happens when nginx tries to use a cached
connection, it automatically tries again as long as it is
permitted by "proxy_next_upstream"
More information about the nginx