Is it possible to monitor the fair proxy balancer?

Sun Jun 29 03:02:34 MSD 2008

On Sat, Jun 28, 2008 at 9:54 PM, Grzegorz Nosek
<grzegorz.nosek at gmail.com> wrote:
> I'd like to gather ideas about
> how to notify the outside world. A log message? Sending a signal
> somewhere? An SNMP trap? Every way has its advantages and disadvantages,
> so I'd like to pick the one that sucks the least.

Why just one? A status page supplemented by machine-readable log
output is a good solution that I think would satisfy most sysadmins.

>> Pardon me for asking a naive question, but to change the list of
>> backends, would you not simply edit the config file and do a SIGHUP? I
>> would reset whatever internal structures that are kept by the workers,
>> but I can't think of anything that's not okay to lose.
>
> Yes. That's the obvious solution but apparently not always acceptable,
> especially when you'd want to use an external monitoring system to do
> this automatically.

What's simpler for an external monitoring system than sending a signal
to a process?

Of course, you could go all the way and do a Varnish-style admin
interface. I have mentioned Varnish before on this list. Varnish has a
pretty clever admin/monitoring infrastructure. For example, you can
load multiple configs and selectively enable them:

$ varnishadm vcl.load test /etc/varnish/test.vcl
$ varnishadm vcl.use test
# ... something goes horribly wrong ...
$ varnishadm vcl.use boot

The use of named configs means the input can be anything (even your
default set of config files). You can load it, try it out, and unload
it.

You could do worse than looking at Varnish's logging system for ideas.
Varnish uses circular buffers in shared memory for logging, and its
logs are explicitly machine-readable, each line being a tag followed
by a value. So log output looks like this:

   14 Debug        c "Hash Match:
/-/cache/border/w=6;h=6;sw=true;sx=0;sy=3;sbr=10;sbs=5;sm=10;sp=0;c=fff;t=r_24.png#origo.no#"
   14 Hit          c 1402130806
   14 VCL_call     c hit
   14 VCL_return   c deliver
   14 Length       c 217
   14 VCL_call     c deliver
   14 VCL_return   c deliver
   14 TxProtocol   c HTTP/1.1
   14 TxStatus     c 200
   14 TxResponse   c OK
   14 TxHeader     c Status: 200 OK

and so on.

In addition to making it superbly easy for scripts to graph, analyze
and monitor activity in real time, this lets you tail the log for
specific events or strings, and since it's all RAM-based, you can get
real-time, low-overhead debug log output immediately without changing
any configuration settings or reloading the daemon. As far as I know,
Varnish only logs when you listen to log output and filtered by what
you're listening for, but I could be wrong.

Using shared memory with Nginx's worker process model should not pose
any problems as each worker could maintain its own shared memory and
thus avoid the need for locking.

>> >  - a new option, e.g. max_requests 10 10 20 20 (specifying the number
>> >   for each backend in the order of server directives)
>>
>> That's a horrible syntax and one that is going to cause problems as
>> you add or remove backends from the config. A max_requests setting
>> belongs on each backend declaration.
>
> Like I wrote in the snipped part, I cannot easily add options to the
> server directives (at least without patching nginx or reinventing the
> square wheel). I don't like the max_requests idea too, for precisely the
> same reason. I presume that means the overloading of weight=X is at
> least acceptable.

I think you have to push Igor for a more flexible internal infrastructure. :-)

Even something string-based would work, even if it would be hackier
than a true syntax:

  server 127.0.0.1:10000 option <key>=<value> [option ...];

Eg.,

  server 127.0.0.1:10000 option fair.max_conns=5;

>> You should only return an error if a request cannot be served within a
>> given timeout, not when all backends are full.
>
> Will have to think about it. This has the potential of busy-looping when
> all the backends are indeed full (or down, but then one can just send a
> hard error and be done with it). I don't think nginx has a way to be
> told "everything is unavailable now, come back to me in a second or
> two" or even better "I'll tell you when to ask me again".

I think Nginx needs something like this.

Alexander.