Introducing backend healthchecking plugin

Tue Mar 2 21:17:07 MSK 2010

On Mar 02, Piotr Sikora wrote:
>Yeah, that's probably what Grzegorz meant. You would just need to call 
>ngx_supervisord_execute(uscf, NGX_SUPERVISORD_CMD_STOP, backend_number, 
>NULL) and then all ngx_supervisord-aware load balancers (upstream_fair, 
>round_robin & ip_hash) would automagically stop using failed backend until 
>you would execute NGX_SUPERVISORD_CMD_START.
>
>Full API spec is available at:
>http://github.com/FRiCKLE/ngx_supervisord/blob/master/patches/ngx_http_upstream_round_robin.patch
>
>At the moment one would need to specify "supervisord none;" in order to 
>enable supervisord-less configuration, because there is no such call in API, 
>but I could add this in next release if you would like to use it.

This sonuds like what I needed. So let me rephrase the problem in its
entirity:

(1) Under normal circumstances, nginx would use proxy_next_upstream in
conjunction with max_fails and fail_timeout (for the rr module) to
declare an upstream as up or down. This is an inline check since it is
monitoring real traffic.

(2) In addition, the health-check module provides an out of band health
check mechanism wherein, it periodically polls a specific url and uses
the HTTP status/body to determine if an upstream needs to be marked as
up or down

Both these styles have their own benefits.

The first style keeps track of the health by looking at the response of
actual requests. This is important since a health check url does not
automatically indicate the health of your real application.

The second style is needed in cases where we plan to do some maintanence
activity on an upstream server and want to proactively not send traffic
to it. Typical example is when you want to push new software, check with
a couple of requests and see if your app is behaving well and if all
looks fine, direct traffic to it.

In the absence of priority between the two styles of checking, we could
end up with a flapping upstream status. The logical priority seems to be
that #2 wins over #1. So, if the health check url says an upstream
server is down, no traffic should be sent it way and the health status
evaluation based on style #1 should be ignored. If the health check
deems and upstream to be up, then the outcome of #1 is the final status.

So where do we get these features from:
 - #1 is provided by the stock upstream modules
 - #2 is provided by the health check module
 - the ngx_supervisord module seems have the hooks that will let us
   achieve the prioritization once the health-check module uses this
   feature

To get all of this running, we would need 2 patches on the upstream
module; one for supervisord and the other for health-check and the
health-check module itself will have to invoke ngx_supervisord_execute
to mark an upstream as up or down.

I will not have time before this weekend to get started on merging
these; so if someone gets down to doing it earlier, thanks :-)