Monitoring http returns

Thu Apr 12 01:03:47 UTC 2018

Just to be clear, I’m not contrasting active synthetic testing with monitoring resource consumption. I think that the highest value variable is $, or those variables that have highest correlation to profit. The real customer experience is probably #2 after sales.  Monitoring things like active connections, cache hit ratios etc is important to understand “what is normal?” It’s easy for our mental model of how a site works to differ markedly from reality.

Sent from my iPhone

> On Apr 11, 2018, at 2:04 AM, Jeff Abrahamson <jeff at p27.eu> wrote:
> 
>> On Wed, Apr 11, 2018 at 01:17:14AM -0400, Peter Booth wrote:
>> There are some very good reasons for doing things in what sounds
>> like a heavy inefficient manner.
> 
> I suspected, thanks for the explanations.
> 
> 
>> The first point is that there are some big differences between
>> application code /business logic and monitoring code:
>> 
>> [...]
> 
> good summary, I agree with you.
> 
> 
>> tailing a log file doesnt sound sexy, but its also pretty hard to
>> mess it up. I monitored a high traffic email site with a very short
>> Ruby script that would tail an nginx log, pushing messages ten at a
>> time as UDP datagrams to an influxdb.  The script would do its thing
>> for 15 mins then die. cron ensured a new instance started every 15
>> minutes. It was more efficient than a shell script because it didn't
>> start new processes in a pipeline.
> 
> It's hard to mess up as long as you're not interested in
> exactly-once. ;-)
> 
> The tail solution has the particularity that (1) it could miss things
> if the short gap between process death and process start sees more
> events than tail catches at startup or if the log file rotates a few
> seconds into that 15 minute period, and (2) it could duplicate things
> in case of very few events in that period.  Now, with telegraf/influx,
> duplicates aren't a concern, because influx keys on time, and our site
> is probably not getting so much traffic that a tail restart is a big
> deal, although log rotation could lead to gaps we don't like.
> 
> Of course, this is why Logwatch was written...
> 
> 
>> I like the scalar guide but I disagree with their advice on active
>> monitoring I think its smarter to use real user requests to test if
>> servers are up. i have seen many high profile sites that end up
>> serving more synthetic requests than real customer initiated
>> requests.
> 
> I'm not sure I understood what you mean by "active monitoring".  I've
> understood "sending http queries to see if they are handled properly".
> 
> In that context: I think both submitting queries (from outside one's
> own network) and passively watching stats on the service itself are
> essential.  Passively watching stats gives me information on internal
> state, useful in itself but also when debugging problems.  Active
> monitoring from a different network can alert me to problems that may
> not be specific to any one service, maybe even are at the network
> level.
> 
> Of course, yes, active monitoring shouldn't be trying to DoS my
> service. ;-)
> 
> Jeff Abrahamson
> https://www.p27.eu/jeff/
> 
> 
>>    On 11 Apr 2018, at 12:19 AM, Jeff Abrahamson <jeff at p27.eu> wrote:
>> 
>>    I want to monitor nginx better: http returns (e.g., how many
>>    500's, how many 404's, how many 200's, etc.), as well as request
>>    rates, response times, etc.  All the solutions I've found start
>>    with "set up something to watch and parse your logs, then ..."
>> 
>>    Here's one of the better examples of that:
>> 
>>        https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide
>> 
>>    Perhaps I'm wrong to find this curious.  It seems somewhat heavy
>>    and inefficient to put this functionality into log watching,
>>    which means another service and being sensitive to an eventual
>>    change in log format.
>> 
>>    Is this, indeed, the recommended solution?
>> 
>>    And, for my better understanding, can anyone explain why this
>>    makes more sense than native nginx support of sending UDP
>>    packets to a monitor collector (in our case, telegraf)?
>> 
>>    --
>> 
>>    Jeff Abrahamson
>>    +33 6 24 40 01 57
>>    +44 7920 594 255
>> 
>>    http://p27.eu/jeff/
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx