Feature requestED: monitoring Nginx from the outside
fb at francois.battail.name
Fri May 2 17:51:13 MSD 2008
Le vendredi 02 mai 2008 à 10:52 +0200, Manlio Perillo a écrit :
> The problem with this is that the script can arbitrarily block Nginx if
> it holds the lock for too much time.
I will not call a sem_wait() but a sem_trywait() of course! If Nginx
cannot write because a script hold the semaphore then the script will
read the old values, I don't see an issue.
> Ok, but I think that providing a file system interface is not the better
> If you want to monitor global variables, then you can use the
> stub_status module (maybe adding new global shared variables).
Stub_status works but it's not the cleanest code in Nginx and there's no
simple way to extend the variables watched since you need to modify
specifically other modules using blocks of conditionnal compilation. If
you modify stub_status you potentially break Collectd and Nagios
A file system interface is universal and means it will be easy to use
whatever tool you want. A monitoring agent written in C will be happy to
read a file, a little bit less happy if a www library or executing
wget is needed to fetch data.
That's why I propose two things:
1) A generic interface for monitoring agents
The easiest one: a file-like and a list of key:value, even if the
monitoring agent doesn't know the semantic of the key it can report back
the value and a graph can be made. Of course it's possible to modify
stub_status (and to break compatibility) to do the same things but it
will be of no help for point 2. Don't know today if it will be a shared
memory or a regular file mmaped (file locking on unices is a complete
mess :( ).
2) An API
An API for other modules to help providing variables for monitoring. At
the cost of an indirection it may be possible at runtime to choose if
this variable is monitored and then to do an atomic_t operation or not.
If a module offer some variables for monitoring the user can choose or
not to monitor. That's value for the software *and* for the user.
The API could be as simple as:
(ngx_str_t * name, ngx_str_t * command_name, ngx_int_t option) ;
(ngx_monitoring_value_t * value, ngx_int_t nbr) ;
(ngx_monitoring_value_t * value, ngx_int_t new_value) ;
For example, in the case of the upstream server round robin module, code
would be like this (pseudo code):
servers = array of ngx_monitoring_value_t * [nbr_servers]
for each upstream server
servers [i] = ngx_register_monitoring_value
("upstream-status-"+server_name [i], "upstream_server_status",0) ;
if (event == down)
ngx_monitoring_value_set (server [i], 0) ;
else if (event == up)
ngx_monitoring_value_set (server [i], 1) ;
Just put "monitor upstream_server_status ;" in nginx.conf and my module
will do all the atomic_t stuff else it will use normal operations.
Cost at runtime: one function call and a conditionnal per variable...
> If you want to monitor things like gzip compression ratio, then just
> implement a custom variable $gzip_ratio that the user can use in the log
OK, gzip ratio was not the best real life example ;-) but imagine you
have MRTG graphs and important values in the log, you ran stress tests
for 24 h, 1.4 10^9 requests later, the error log is 100 MB long, looks
like a nightmare to exploit the log to correlate with load for example,
Just a different example where I want logging *and* monitoring. I've a
special Nginx module with a circular buffer used to communicate with
threads. If there's a buffer overflow, I log it, but it would be nice
(for me) to have a circular buffer overflow error counter included in
the monitoring watch set. Of course I can "hack" stub_status and
collectd plugin, but it's better to propose a more general solution
without breaking anything and with no significant performance hit.
Thank you very much for your time and your input Manlio, even if we
don't agree on some points, it is very stimulating for me to have a
contradictor such as you.
More information about the nginx