Keepalived Connections Reset after reloading the configuration (HUP Signal)

Fri Mar 29 00:49:48 UTC 2019

Hi,

We are seeing some fallout from this behaviour on keep-alive connections
when proxying traffic from remote POPs back to an Origin DC that, due to
latency, brings about a race condition in the socket shutdown sequence. The
result being the fateful "upstream prematurely closed connection while
reading response header from upstream" in the Remote POP.

A walk through of what we are seeing:

1. Config reload happens on the Origin DC.
2. Socket shutdowns are sent to all open, but not transacting, keep-alive
connections.
3. Remote POP sends data on a cached connection at around the same time as
#2, because at this point it has not received the disconnect yet.
4. Remote POP then receives the disconnect and errors with "upstream
prematurely..".

Ideally we should be able to have the Origin honour the
`worker_shutdown_timeout` (or some other setting) for keep-alive
connections. That way we would be able to use the `keepalive_timeout`
setting for upstreams to ensure the upstream's cached connections always
time out before a worker is shutdown. Would that be possible or is there
another way to mitigate this scenario?

/David

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,197927,283564#msg-283564