worker cpu balance

Sun Apr 13 05:04:20 MSD 2008

Hi all,

during the testing with the donated 10GB network cards of Myricom to the
haproxy project http://haproxy.1wt.eu/ I have asked the author of this
nice peace of SW if he will be so pleasent to run a test with nginx
instead of tux. He was ;-))

Here the description of his test with haproxy
http://haproxy.1wt.eu/10g.html and now what he have send me back from
his tests with nginx.

---
It works fast. Since it uses sendfile, it is as fast as Tux on large
files (>= 1MB), and saturates 10 Gbps with 10% of CPU with 1MB files.

However, it does not scale on multiple CPUs, whatever the number of
worker_processes. I've tried 1, 2, 8, ... The processes are quite there,
but something's preventing them from sharing a resource since the
machine never goes beyond 50% CPU used (it's a dual core). Sometimes,
"top" looks like this :

Tasks: 189 total,   3 running, 186 sleeping,   0 stopped,   0 zombie
  Cpu0 :  40.3% user,  55.2% system,   0.0% nice,   4.5% idle,   0.0% IO-wait
  Cpu1 :   2.7% user,   1.3% system,   0.0% nice,  96.0% idle,   0.0% IO-wait
Mem:   2072968k total,    92576k used,  1980392k free,    11604k buffers
Swap:        0k total,        0k used,        0k free,    25656k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  Command                                                                    
  1984 nobody    20   0  2980  996  492 S 34.7  0.0   0:49.85 nginx.bin                                                                  
  1986 nobody    20   0  2980  992  488 S 34.7  0.0   0:51.91 nginx.bin                                                                  
  1980 nobody    20   0  2980  996  492 S 25.8  0.0   0:47.29 nginx.bin                                                                  
  1983 nobody    20   0  2980  996  492 S  2.0  0.0   0:48.07 nginx.bin                                                                  
  1988 nobody    20   0  2980  996  492 R  2.0  0.0   0:45.75 nginx.bin                                                                  

Sometime it looks like this :

Tasks: 188 total,   2 running, 186 sleeping,   0 stopped,   0 zombie
  Cpu0 :  12.7% user,  12.7% system,   0.0% nice,  74.6% idle,   0.0% IO-wait
  Cpu1 :  32.4% user,  39.4% system,   0.0% nice,  28.2% idle,   0.0% IO-wait
Mem:   2072968k total,    92820k used,  1980148k free,    11604k buffers
Swap:        0k total,        0k used,        0k free,    25660k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  Command                                                                    
  1985 nobody    20   0  2980  996  492 R 53.7  0.0   0:48.14 nginx.bin                                                                  
  1982 nobody    20   0  2980  992  488 S 31.8  0.0   0:39.40 nginx.bin                                                                  
  1986 nobody    20   0  2980  992  488 S  8.0  0.0   0:54.71 nginx.bin                                                                  
  1988 nobody    20   0  2980  996  492 S  5.0  0.0   0:48.79 nginx.bin                                                                  
  1983 nobody    20   0  2980  996  492 S  2.0  0.0   0:52.20 nginx.bin                                                                  

Rather strange.
---

I have seen the same behaviour.

Here the description of my setup:

### cat /proc/cpuinfo of both mashines
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
stepping        : 6
cpu MHz         : 2400.075
cache size      : 4096 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni
  monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips        : 4802.73
clflush size    : 64

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
stepping        : 6
cpu MHz         : 2400.075
cache size      : 4096 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips        : 4800.13
clflush size    : 64
###

free -m ( no swap usage )
              total       used       free     shared    buffers     cached
Mem:          2027       1132        895          0        155        828

###

When I run ab I get the follwoing:

close:
ab -n 40000 -c 2500 http://192.168.1.17:8080/10k

Document Length:        10240 bytes
Concurrency Level:      2500
Complete requests:      40000

Server Software:        lighttpd/1.5.0
Time taken for tests:   7.807093 seconds
Requests per second:    5123.55 [#/sec] (mean)

Server Software:        nginx/0.6.29
Time taken for tests:   8.96004 seconds
Requests per second:    4940.71 [#/sec] (mean)

lighttpd use both workers with similar usage.

nginx use both workers but one is used ~70-80 and the other only ~20-30
percent.

keep alive:
ab -n 40000 -c 2500 -k http://192.168.1.17:8080/10k

Server Software:        lighttpd/1.5.0
Time taken for tests:   6.625588 seconds
Requests per second:    6037.20 [#/sec] (mean)

Server Software:        nginx/0.6.29
Time taken for tests:   6.732870 seconds
Requests per second:    5941.00 [#/sec] (mean)

lighttpd use both workers but not similar.
nginx use only one worker.

Well with a 10k file it's easy but what is the behaviour with a 1M file?

ab -n 4000 -c 250 -k http://192.168.1.17:8080/1M

Document Length:        1048576 bytes
Concurrency Level:      250
Complete requests:      4000

Server Software:        lighttpd/1.5.0
Time taken for tests:   59.870157 seconds
Keep-Alive requests:    3909
Requests per second:    66.81 [#/sec] (mean)

Server Software:        nginx/0.6.29
Time taken for tests:   59.899784 seconds
Keep-Alive requests:    4000
Requests per second:    66.78 [#/sec] (mean)

lighttpd use both workers with similar usage.
nginx use again only one the first worker.

A different picture is shown when I use inject
http://1wt.eu/tools/inject/

---nginx
Clients : 5499
Hits    : 178202 + 0 abortés
Octets  : 3863054678
Duree   : 61014 ms
Debit   : 63314 kB/s
Reponse : 2920 hits/s
Erreurs : 0
Timeouts: 0
Temps moyen de hit: 1729965.8 ms
Temps moyen d'une page complete: 9067.0 ms
Date de demarrage: 1208046930 (13 Avr 2008 - 2:35:30)
Ligne de commande : ./inject29 -l -n 40000 -p 2 -o 8 -u 2500 -s 20 -G 192.168.1.17:8080/1M -d 60 

---lighty
Clients : 5000
Hits    : 57310 + 0 abortés
Octets  : 4022907712
Duree   : 61010 ms
Debit   : 65938 kB/s
Reponse : 939 hits/s
Erreurs : 0
Timeouts: 0
Temps moyen de hit: 0.0 ms
Temps moyen d'une page complete: 0.0 ms
Date de demarrage: 1208047028 (13 Avr 2008 - 2:37:08)
Ligne de commande : ./inject29 -l -n 40000 -p 2 -o 8 -u 2500 -s 20 -G 192.168.1.17:8080/1M -d 60 

with this testing tool both servers distribute the workers is similar
manner.

The first worker get always the most 'load'.

Have anybody seen the same behaviour in the real world or happen this
only at test time?

You can get both config files from
http://none.at/lighttpd.conf
http://none.at/nginx.conf

BR

Aleks