nginx in high concurrency setups

Wed Dec 16 05:04:53 MSK 2009

Thanks for the pointer. Right now the setup I'm testing with looks like 
this (both hosts and VMs using Centos 5.4):

2 virtual machines with 1 vcpu each and 1gb of ram. These get balanced by 
LVS-DR running on the host system using a weighted round-robin scheduler 
with persistence disabled.
The payload is 50.000 files with random characters each exactly 1kb in size 
distributed across 50 directories with 1000 files each.
On the nginx side I'm pretty much running with the default config right now 
with access logging disabled (1 worker thread and events 
{worker_connections  1024;} ).

The machine I'm testing from is connected to the same gbit switch as the 
host with the 2 VMs. The client machine runs the default setup but the 
load-balanced IP is excluded from connection-tracking in the iptables firewall.
The VMs have their firewalls completely disabled and have been modified in 
the following way:

net.ipv4.tcp_fin_timeout = 10
net.core.somaxconn = 10000
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_tw_buckets = 400000
net.ipv4.tcp_max_syn_backlog = 10240
net.ipv4.tcp_synack_retries = 3

The "tcp_tw_reuse" really helped with my initial TIME_WAIT problem.
My "siege" results now look like this:

[root at virt1 ~]# siege -b -c 250
** SIEGE 2.69
** Preparing 250 concurrent users for battle.
The server is now under siege...
Lifting the server siege...      done.
Transactions:		      420203 hits
Availability:		      100.00 %
Elapsed time:		       24.34 secs
Data transferred:	      410.36 MB
Response time:		        0.01 secs
Transaction rate:	    17263.89 trans/sec
Throughput:		       16.86 MB/sec
Concurrency:		      239.56
Successful transactions:      420204
Failed transactions:	           0
Longest transaction:	       21.00
Shortest transaction:	        0.00

The nginx version I'm running is 0.7.64.

What I'm wondering about at the moment are these stray requests that take 
much longer than 99% of the others. Here is a distribution with "ab":

Percentage of the requests served within a certain time (ms)
   50%     11
   66%     15
   75%     17
   80%     18
   90%     21
   95%     23
   98%     26
   99%     29
  100%   3024 (longest request)

As you can see 99% of the requests are delivered in 29ms or less but most 
of the time there is at least one request that takes 3s (and always pretty 
much exactly 3s at least with "ab").

Any ideas for further optimizations? Should I maybe choose a different 
event-model for this particular load (lots of small short-lived requests)?
Also when I tries configuring 2 worker processes it looked like only 1 cpu 
was really under load which is why I reduced the VMs to 1vcpu since the 
second one didn't get utilised well and I instead created the second VM and 
added the load-balancing to get a more even load distribution.

Regards,
   Dennis

On 12/15/2009 04:56 PM, Rasmus Andersson wrote:
> Richard Jones of Last.fm fame has written some about this kind of testing:
> http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1
>
> On Mon, Dec 14, 2009 at 21:16, merlin corey<merlincorey at dc949.org>  wrote:
>> On Mon, Dec 14, 2009 at 11:16 AM, Dennis J.<dennisml at conversis.de>  wrote:
>>> Hi,
>>> I'm currently experimenting how many concurrent connections nginx can
>>> handle. The problem I'm running into is that for each request I send to the
>>> server I get a connection in TIME_WAIT state. If I do this using
>>> benchmarking tools like httperf or ab I quickly seem to hit a ceiling. Once
>>> the number of TIME_WAIT connections reaches about 16000 the benchmarking
>>> tools just freeze and I have to wait until that number comes down again.
>>> What is the reason for these TIME_WAIT connections and how can I get rid of
>>> them faster? I'm only serving small static files and the delivery is not
>>> supposed to take longer than say 300ms so any connection that takes longer
>>> than that can be aborted if that is necessary to make room for new incoming
>>> connections.
>>> Does anyone have experience with serving lots of small static requests using
>>> nginx?
>>>
>>> Regards,
>>>   Dennis
>>>
>>> _______________________________________________
>>> nginx mailing list
>>> nginx at nginx.org
>>> http://nginx.org/mailman/listinfo/nginx
>>>
>>
>>
>> Hello,
>>
>> You will need to tune your OS's TCP and socket settings, I do believe.
>>   It is dependent on your OS what exactly you must do.
>>
>> Also, keep in mind, that when you are doing these tests, ideally you
>> should be sending the test-load from multiple machines that are not
>> the same machine that is serving.  This is to rule out the
>> benchmarking program fighting for resources with nginx and to rule out
>> a single machine's ceilings.
>>
>> Thanks,
>> Merlin
>>
>> _______________________________________________
>> nginx mailing list
>> nginx at nginx.org
>> http://nginx.org/mailman/listinfo/nginx
>>
>
>
>