high "Load Average"

Stefan Parvu sparvu at systemdatarecorder.org
Tue Mar 16 19:35:10 MSK 2010


> 
>     Time      Int   rKB/s   wKB/s   rPk/s   wPk/s    rAvs    wAvs %Util    Sat
> 10:24:02       lo    0.33    0.33   290.8   290.8    1.16    1.16  0.00   0.00
> 10:24:02     eth0    0.29    0.79  1183.8  1448.1    0.25    0.56  0.00   0.00
> 

make sure you run enicstat since Util, Sat are always 0 on Linux if you
dont do that:

"
Added a script, enicstat, which uses ethtool to get speeds and duplex modes for all interfaces, then calls nicstat with an appropriate -S value."

http://blogs.sun.com/timc/entry/nicstat_the_solaris_and_linux


> What I am talking about is a little bit different. In peak hours response time degrades significantly, but is still more or less acceptable, but what is unacceptable is that machine A slows down and replies for external actions (like SSH login, VPN connection) very slowly. For example, I sometimes even can't establish VPN connection to it due to timeouts. (there is openvpn server running on it). That's why I am talking about "slow machine A" and blame it. That's why I am worried about "uninterruptible sleep" processes and thinking about scheduling lag


You are talking about a system slowdown caused by your current
workload. This might be caused by a series of things some related 
to the kernel. But most likely analysing with SystemTap whats going 
on here might help. Thats why I keep telling people DTrace is like a gold mine
or a step in the future. Since you are not on Solaris you need to
start looking into SystemTap. If possible, have a box with Solaris
or FreeBSD next running this workload and check with DTraceToolkit.

Good luck,
stefan



More information about the nginx mailing list