high "Load Average"

Sessna nginx-forum at nginx.us
Sun Mar 14 12:15:39 MSK 2010

Cliff Wells Wrote:
> This means that php-cgi is spending 99.99% of its time in I/O, not that
> it's creating 99.99% of your I/O. It means php-cgi is waiting almost
> all the time.

This is reasonable and fully corresponds to the PHP script action pattern: gather data for quering MySQL, issue prepared query and wait for results
Taking into account that MySQL is running on another box, PHP is waiting almost all the time for results.

> Yes, and this is certainly what is happening. 
> You have to remember that "load" is an indication of how many processes are waiting to be
> scheduled, and this includes processes that are waiting on I/O. 
> In this case it looks like PHP isn't generating a lot of I/O, rather it is
> *waiting*, creating a high load, 

Agreed, "load" indicates average number of processes that are running, runnable or in uninterruptible sleep. 
PHP waiting for netwrok responce must be in uninterruptible sleep state, than (it is must not be running and doubtly runnable).
So, waiting PHP processes can make "load average" very high.

> and at the same time, you are running out of PHP processes to handle new requests, 

Correct me if I am wrong, but when nginx lacks ready PHP process "11: Resource temporarily unavailable while connecting to upstream" message is put into error.log 
Have seen plenty of these during initial system setup, but after adding php-fpm children all goes well: no such messages in logs and no "5xx error" pages reported by users for a long time. 
I am monitoring logs carefully. So I suppose that all is OK with PHP processes number.

> which is why the box slows down.
not sure yet

>I'd follow these steps:
> 1) check the state of the MySQL server. See if this machine is overloaded.

MySQL box is fine and running, no excessively high load or any type of slowing down

>2) check MySQL itself. Turn on query logging and see if you have some
>query that's taking a long time to complete.

There are several queries that can take up to several seconds to complete, looking into optimizations

>3) take a look at your network to make sure that there isn't an issue
>there (saturation, packet loss, etc).

Can you give an advice on how to check saturation of the link between these two boxes?
Boxes are standing close to each other and network transmission looks like fine as well. 

May be I am missing something, but I see the general overview of the task as following:
Machine A hosts N processes waiting for network reply. (Blocked on recv(), I assume)
Machine B is a "Black Box", running something which produces results for machine A. At the moment it doesn't matter what it is and how long it takes to produce a result. Let's assume time T.
Increasing number of waiting processes on machine A slows it down. 
Sounds strange, though. Thought that sleeping process consumes memory only and doesn't impact performance. May be it is something related to task scheduling? Is big number of sleeping process impacts performance and/or slows down scheduler?


Posted at Nginx Forum: http://forum.nginx.org/read.php?2,63176,63645#msg-63645

More information about the nginx mailing list