[PATCH] Use BPF to distribute packet to different work thread.

Mon Sep 21 11:29:06 UTC 2020

Hi Liu Qiao,

We've testing early version patch with same BPF code (on AWS cloud, 
without ADQ-capable cards) on relatively small payloads and found no 
significant difference. We'd like to retest it with large payload size; 
could you please elaborate a bit more how you did perform the test?
I mean 'nginx -T' output, number of CPU cores and CPU model, test 
scripts, 1-megabyte file and any system tuning parameters. One of 
caveats was that wrk may produce significant client load and most 
effective way to distribute client load between CPU cores is running wrk 
in one thread via taskset.

Another big caveat that I've found during my test is the strange 
behavior of this BPF code: when client and server runs on the same 
server, all requests was served by one nginx worker process. Did you try 
to run wrk and nginx locally in your test?

Thanks in advance!

15.09.2020 05:08, Liu, Qiao пишет:
> Below is 5 times test result compare, 112 threads, 10000 connections, 1M object http request. Seems P99 have great improvement, and Max is also reduced
> 
> 
> 
>                                     AVG          Stdev            Max        P99
>                    test 1      1.32s        447.09ms     5.48s      2.82s
> BPF           test 2      1.39s        513.8ms       9.42s      3.1s
>                    test 3      1.4s          341.38ms     5.63s      2.55s
>                    test 4      1.41s        407.45ms     6.96s      2.77s
>                    test 5      1.29s        644.81ms     9.45s      3.74s
>                   Average  1.362s      470.906ms   7.388s    2.996s
> 
> NonBPF   test 1      1.48s         916.88ms     9.44s       5.08s
>                   test 2      1.43s         658.48ms     9.54s       3.92s
>                   test 3      1.41s         650.38ms     8.63s       3.59s
>                   test 4      1.29s         1010ms        10s           5.21s
>                   test 5      1.31s         875.01ms     9.53s       4.39s
>               Average     1.384s        822.15ms    9.428s    4.438s
> 
> 
> Thanks
> LQ
> -----Original Message-----
> From: nginx-devel <nginx-devel-bounces at nginx.org> On Behalf Of Liu, Qiao
> Sent: Monday, September 14, 2020 9:18 AM
> To: nginx-devel at nginx.org
> Subject: RE: [PATCH] Use BPF to distribute packet to different work thread.
> 
> Hi, Maxim Dounin:
> Thanks for your reply, this server is random selected, we just do BPF and no-BPF test, I think the latency based on server configuration, not related with BPF patch, also the NIC of the server is Mellanox, not ADQ capable hardware , we will do more test Thanks LQ
> 
> -----Original Message-----
> From: nginx-devel <nginx-devel-bounces at nginx.org> On Behalf Of Maxim Dounin
> Sent: Monday, September 14, 2020 7:40 AM
> To: nginx-devel at nginx.org
> Subject: Re: [PATCH] Use BPF to distribute packet to different work thread.
> 
> Hello!
> 
> On Fri, Sep 11, 2020 at 05:41:47AM +0000, Liu, Qiao wrote:
> 
>> Hi, Vladimir Homutov:
>> The below is our WRK test result output with BPF enable
>>
>>    112 threads and 10000 connections
>>    Thread Stats   Avg      Stdev     Max   +/- Stdev
>>      Latency   608.23ms  820.71ms  10.00s    87.48%
>>      Connect    16.52ms   54.53ms   1.99s    94.73%
>>      Delay     153.13ms  182.17ms   2.00s    90.74%
>>      Req/Sec   244.79    142.32     1.99k    68.40%
>>    Latency Distribution
>>    50.00%  293.50ms
>>    75.00%  778.33ms
>>    90.00%    1.61s
>>    99.00%    3.71s
>>    99.90%    7.03s
>>    99.99%    8.94s
>>    Connect Distribution
>>    50.00%    1.93ms
>>    75.00%    2.85ms
>>    90.00%   55.76ms
>>    99.00%  229.19ms
>>    99.90%  656.79ms
>>    99.99%    1.43s
>>    Delay Distribution
>>    50.00%  110.96ms
>>    75.00%  193.67ms
>>    90.00%  321.77ms
>>    99.00%  959.27ms
>>    99.90%    1.57s
>>    99.99%    1.91s
>> Compared with no BPF but enable reuseport as below
>>
>> 112 threads and 10000 connections
>>    Thread Stats   Avg      Stdev     Max   +/- Stdev
>>      Latency   680.50ms  943.69ms  10.00s    87.18%
>>      Connect    58.44ms  238.08ms   2.00s    94.58%
>>      Delay     158.84ms  256.28ms   2.00s    90.92%
>>      Req/Sec   244.51    151.00     1.41k    69.67%
>>    Latency Distribution
>>    50.00%  317.61ms
>>    75.00%  913.52ms
>>    90.00%    1.90s
>>    99.00%    4.30s
>>    99.90%    6.52s
>>    99.99%    8.80s
>>    Connect Distribution
>>    50.00%    1.88ms
>>    75.00%    2.21ms
>>    90.00%   55.94ms
>>    99.00%    1.45s
>>    99.90%    1.95s
>>    99.99%    2.00s
>>    Delay Distribution
>>    50.00%   73.01ms
>>    75.00%  190.40ms
>>    90.00%  387.01ms
>>    99.00%    1.34s
>>    99.90%    1.86s
>>    99.99%    1.99s
>>
>>
>>  From the above results, there shows almost 20% percent latency
>> reduction. P99 latency of BPF is 3.71s , but without BPF is 4.3s.
> 
> Thank you for the results.
> 
> Given that latency stdev is way higher than the average latency, I don't think the "20% percent latency reduction" observed is statistically significant.  Please try running several tests and use ministat(1) to check the results.
> 
> Also, the latency values look very high, and request rate very low.  What's on the server side?
> 
> --
> Maxim Dounin
> http://mdounin.ru/
> _______________________________________________
> nginx-devel mailing list
> nginx-devel at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx-devel
> _______________________________________________
> nginx-devel mailing list
> nginx-devel at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx-devel
> _______________________________________________
> nginx-devel mailing list
> nginx-devel at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx-devel
> 

-- 
Best regards,
Mikhail Isachenkov
NGINX Professional Services