Monitoring http returns
peter_booth at me.com
Thu Apr 12 01:03:47 UTC 2018
Just to be clear, I’m not contrasting active synthetic testing with monitoring resource consumption. I think that the highest value variable is $, or those variables that have highest correlation to profit. The real customer experience is probably #2 after sales. Monitoring things like active connections, cache hit ratios etc is important to understand “what is normal?” It’s easy for our mental model of how a site works to differ markedly from reality.
Sent from my iPhone
> On Apr 11, 2018, at 2:04 AM, Jeff Abrahamson <jeff at p27.eu> wrote:
>> On Wed, Apr 11, 2018 at 01:17:14AM -0400, Peter Booth wrote:
>> There are some very good reasons for doing things in what sounds
>> like a heavy inefficient manner.
> I suspected, thanks for the explanations.
>> The first point is that there are some big differences between
>> application code /business logic and monitoring code:
> good summary, I agree with you.
>> tailing a log file doesnt sound sexy, but its also pretty hard to
>> mess it up. I monitored a high traffic email site with a very short
>> Ruby script that would tail an nginx log, pushing messages ten at a
>> time as UDP datagrams to an influxdb. The script would do its thing
>> for 15 mins then die. cron ensured a new instance started every 15
>> minutes. It was more efficient than a shell script because it didn't
>> start new processes in a pipeline.
> It's hard to mess up as long as you're not interested in
> exactly-once. ;-)
> The tail solution has the particularity that (1) it could miss things
> if the short gap between process death and process start sees more
> events than tail catches at startup or if the log file rotates a few
> seconds into that 15 minute period, and (2) it could duplicate things
> in case of very few events in that period. Now, with telegraf/influx,
> duplicates aren't a concern, because influx keys on time, and our site
> is probably not getting so much traffic that a tail restart is a big
> deal, although log rotation could lead to gaps we don't like.
> Of course, this is why Logwatch was written...
>> I like the scalar guide but I disagree with their advice on active
>> monitoring I think its smarter to use real user requests to test if
>> servers are up. i have seen many high profile sites that end up
>> serving more synthetic requests than real customer initiated
> I'm not sure I understood what you mean by "active monitoring". I've
> understood "sending http queries to see if they are handled properly".
> In that context: I think both submitting queries (from outside one's
> own network) and passively watching stats on the service itself are
> essential. Passively watching stats gives me information on internal
> state, useful in itself but also when debugging problems. Active
> monitoring from a different network can alert me to problems that may
> not be specific to any one service, maybe even are at the network
> Of course, yes, active monitoring shouldn't be trying to DoS my
> service. ;-)
> Jeff Abrahamson
>> On 11 Apr 2018, at 12:19 AM, Jeff Abrahamson <jeff at p27.eu> wrote:
>> I want to monitor nginx better: http returns (e.g., how many
>> 500's, how many 404's, how many 200's, etc.), as well as request
>> rates, response times, etc. All the solutions I've found start
>> with "set up something to watch and parse your logs, then ..."
>> Here's one of the better examples of that:
>> Perhaps I'm wrong to find this curious. It seems somewhat heavy
>> and inefficient to put this functionality into log watching,
>> which means another service and being sensitive to an eventual
>> change in log format.
>> Is this, indeed, the recommended solution?
>> And, for my better understanding, can anyone explain why this
>> makes more sense than native nginx support of sending UDP
>> packets to a monitor collector (in our case, telegraf)?
>> Jeff Abrahamson
>> +33 6 24 40 01 57
>> +44 7920 594 255
> nginx mailing list
> nginx at nginx.org
More information about the nginx