high "Load Average"

Cliff Wells cliff at develix.com
Tue Mar 16 01:32:40 MSK 2010


On Sun, 2010-03-14 at 05:15 -0400, Sessna wrote:
> Cliff Wells Wrote:
> -------------------------------------------------------
> > This means that php-cgi is spending 99.99% of its time in I/O, not that
> > it's creating 99.99% of your I/O. It means php-cgi is waiting almost
> > all the time.
> 
> This is reasonable and fully corresponds to the PHP script action pattern: gather data for quering MySQL, issue prepared query and wait for results
> Taking into account that MySQL is running on another box, PHP is waiting almost all the time for results.

Well, all it means is that the bottleneck lies outside PHP.   I wouldn't
read too much else into it.   It just means that 99% of your slowness is
elsewhere.

> > and at the same time, you are running out of PHP processes to handle new requests, 
> 
> Correct me if I am wrong, but when nginx lacks ready PHP process "11: Resource temporarily unavailable while connecting to upstream" message is put into error.log 
> Have seen plenty of these during initial system setup, but after adding php-fpm children all goes well: no such messages in logs and no "5xx error" pages reported by users for a long time. 
> I am monitoring logs carefully. So I suppose that all is OK with PHP processes number.

Maybe.   Or maybe you've simply disguised the problem by throwing more
processes at it.   How many PHP processes are you running?   Can you
provide your php-fpm parameters?   Also, what's an approximation of your
requests per second during these peak times?

> > which is why the box slows down.
> not sure yet
> 
> >I'd follow these steps:
> > 1) check the state of the MySQL server. See if this machine is overloaded.
> 
> MySQL box is fine and running, no excessively high load or any type of slowing down

Again, load isn't always indicative of "speed".   How are you sure it's
responding quickly?   Have you compared the times of queries during
periods of load vs idle times?

> >2) check MySQL itself. Turn on query logging and see if you have some
> >query that's taking a long time to complete.
> 
> There are several queries that can take up to several seconds to complete, looking into optimizations

Going out on a limb: would it be possible to temporarily replace these
slow queries with something faster?   That is, drop the query results
into a table and use that in place of the actual query?

I'd be curious to see the output of iotop on your MySQL server as well.

> >3) take a look at your network to make sure that there isn't an issue
> >there (saturation, packet loss, etc).
> 
> Can you give an advice on how to check saturation of the link between these two boxes?
> Boxes are standing close to each other and network transmission looks like fine as well. 
> 
> May be I am missing something, but I see the general overview of the
> task as following:
> Machine A hosts N processes waiting for network reply. (Blocked on
> recv(), I assume)
> Machine B is a "Black Box", running something which produces results
> for machine A. At the moment it doesn't matter what it is and how long
> it takes to produce a result. Let's assume time T.

I cannot possibly fathom how the length of T doesn't matter.   To me
that is probably the single most important data point to investigate.
The entire response time directly depends on T and you say it "doesn't
matter"?

> Increasing number of waiting processes on machine A slows it down. 

Why do you think machine A is slow?   Machine A waits for machine B and
you blame machine A.   I don't see how you arrive at this conclusion.
To me it seems it could be either machine (or the connection between the
machines) and more testing needs to be done to arrive at such a
conclusion.   

I'd be more interested in how T varies as number of sleeping processes
increases.

> Sounds strange, though. Thought that sleeping process consumes memory
> only and doesn't impact performance. May be it is something related to
> task scheduling? Is big number of sleeping process impacts performance
> and/or slows down scheduler?

Not since kernel 2.6, AFAIK.   How many processes are we talking about
here?

As an aside, one thing that you mentioned earlier that I was wondering
about: what is writing to the local disk at 3.27MB/s (from iotop
output)?   

Cliff




More information about the nginx mailing list