Feature request: Run a script when upstream detected down/up

Tue Apr 29 01:18:37 MSD 2008

On Mon, 2008-04-28 at 14:02 -0700, Rt Ibmer wrote:
> >This sounds like a job for a heartbeat monitor, not a web server.  
> 
> For our needs this would be best handled by nginx.  Here's why...
> Nginx is the first one to know that it considers a server down and has
> stopped routing traffic to it until fail_timeout occurs. 

Well, it *might* be, depending on the timing of the heartbeat and
whether/when a particular request causes Nginx to try that backend.

>  So regardless of whether its right and the upstream is really down,
> or was tripped by a false positive, the bottom line is that it is now
> ignoring that upstream for fail_timeout duration.
> 
> Currently nginx is the only one that knows this.  So yes I can use
> Heartbeat or whatever other monitoring tools are out there.  But those
> tools can say an upstream is up, or down, but nginx could have the
> upstream's state differently (i.e monitoring could say its up when in
> fact it missed a condition that nginx considered the upstream to be
> down - so the monitoring goes on saying the upstream is fine, while
> nginx is treating it as offline - and all the while we have no idea of
> this).
> 
> Bottom line is that it doesn't make any difference whether a
> monitoring script says an upstream server is down or not.  What
> matters is whether nginx considers it down or not.  And for me to know
> that, nginx needs to tell me.

But it does.  It's in your error logs.  There are alternate loggers that
can even allow you to have scripts run when a regex is matched (metalog
for one).   I've used metalog successfully to deter brute-force ssh
attacks for example. 

http://metalog.sourceforge.net/

Metalog is available in most Linux distros (I've used it on Gentoo and
Fedora).

> The beauty of it is that it seems like quite a trivial yet very useful
> function to implement.  Basically where ever the code is that decides
> to ignore an upstream for fail_timeout, it just needs to call out to
> some script to launch it and pass it a param like the name of the
> upstream entity that went down.  Seems like something that could be
> done in just minutes.  Unfortunately I'm not a coder or I would take a
> crack at it.

Except that Nginx is asynchronous, not threaded.  This means that when
your script is called, Nginx will now be delayed while the script is
launched (and what if the script fails?).  You might be able to work
around this, but I suspect it won't be as trivial as you might hope.

Regards,
Cliff