Monitoring http returns

Wed Apr 11 06:04:05 UTC 2018

On Wed, Apr 11, 2018 at 01:17:14AM -0400, Peter Booth wrote:
> There are some very good reasons for doing things in what sounds
> like a heavy inefficient manner.

I suspected, thanks for the explanations.

> The first point is that there are some big differences between
> application code /business logic and monitoring code:
> 
> [...]

good summary, I agree with you.

> tailing a log file doesnt sound sexy, but its also pretty hard to
> mess it up. I monitored a high traffic email site with a very short
> Ruby script that would tail an nginx log, pushing messages ten at a
> time as UDP datagrams to an influxdb.  The script would do its thing
> for 15 mins then die. cron ensured a new instance started every 15
> minutes. It was more efficient than a shell script because it didn't
> start new processes in a pipeline.

It's hard to mess up as long as you're not interested in
exactly-once. ;-)

The tail solution has the particularity that (1) it could miss things
if the short gap between process death and process start sees more
events than tail catches at startup or if the log file rotates a few
seconds into that 15 minute period, and (2) it could duplicate things
in case of very few events in that period.  Now, with telegraf/influx,
duplicates aren't a concern, because influx keys on time, and our site
is probably not getting so much traffic that a tail restart is a big
deal, although log rotation could lead to gaps we don't like.

Of course, this is why Logwatch was written...

> I like the scalar guide but I disagree with their advice on active
> monitoring I think its smarter to use real user requests to test if
> servers are up. i have seen many high profile sites that end up
> serving more synthetic requests than real customer initiated
> requests.

I'm not sure I understood what you mean by "active monitoring".  I've
understood "sending http queries to see if they are handled properly".

In that context: I think both submitting queries (from outside one's
own network) and passively watching stats on the service itself are
essential.  Passively watching stats gives me information on internal
state, useful in itself but also when debugging problems.  Active
monitoring from a different network can alert me to problems that may
not be specific to any one service, maybe even are at the network
level.

Of course, yes, active monitoring shouldn't be trying to DoS my
service. ;-)

Jeff Abrahamson
https://www.p27.eu/jeff/

>     On 11 Apr 2018, at 12:19 AM, Jeff Abrahamson <jeff at p27.eu> wrote:
> 
>     I want to monitor nginx better: http returns (e.g., how many
>     500's, how many 404's, how many 200's, etc.), as well as request
>     rates, response times, etc.  All the solutions I've found start
>     with "set up something to watch and parse your logs, then ..."
> 
>     Here's one of the better examples of that:
> 
>         https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide
> 
>     Perhaps I'm wrong to find this curious.  It seems somewhat heavy
>     and inefficient to put this functionality into log watching,
>     which means another service and being sensitive to an eventual
>     change in log format.
> 
>     Is this, indeed, the recommended solution?
> 
>     And, for my better understanding, can anyone explain why this
>     makes more sense than native nginx support of sending UDP
>     packets to a monitor collector (in our case, telegraf)?
> 
>     --
> 
>     Jeff Abrahamson
>     +33 6 24 40 01 57
>     +44 7920 594 255
> 
>     http://p27.eu/jeff/