[PATCH] Optimal performance when use http non-persistent connection

Thu Nov 21 15:21:47 UTC 2019

Hello!

On Thu, Nov 21, 2019 at 07:22:16PM +0800, Shaokun Zhang wrote:

> Hi Maixm,
> 
> On 2019/11/20 22:29, Maxim Dounin wrote:
> > Hello!
> > 
> > On Mon, Nov 11, 2019 at 03:07:02AM +0000, Zhangshaokun wrote:
> > 
> >> # HG changeset patch
> >> # User Rui Sun <sunrui26 at huawei.com<mailto:sunrui26 at huawei.com>>
> >> # Date 1572848389 -28800
> >> #      Mon Nov 04 14:19:49 2019 +0800
> >> # Branch local
> >> # Node ID a5ae6e9e99f747fcb45082bac8795622938184f1
> >> # Parent  89adf49fe76ada86d84e2af8f5cee9ca8c3dca19
> >> Optimal performance when use http non-persistent connection
> >>
> >> diff -r 89adf49fe76a -r a5ae6e9e99f7 src/core/ngx_cycle.c
> >> --- a/src/core/ngx_cycle.c        Mon Oct 21 20:22:30 2019 +0300
> >> +++ b/src/core/ngx_cycle.c     Mon Nov 04 14:19:49 2019 +0800
> >> @@ -35,6 +35,40 @@
> >> /* STUB */
> >>
> >>
> >> +void
> >> +ngx_change_pid_core(ngx_cycle_t *cycle)
> >> +{
> >> +    ngx_pid_t           setpid;
> >> +    ngx_cpuset_t        *setaffinity=NULL;
> >> +    setpid = ngx_getpid();
> >> +    {
> >> +#if (NGX_HAVE_CPU_AFFINITY)
> >> +        ngx_core_conf_t  *ccf;
> >> +
> >> +        ccf = (ngx_core_conf_t *) ngx_get_conf(cycle->conf_ctx, ngx_core_module);
> >> +
> >> +        if (ccf->cpu_affinity == NULL) {
> >> +            setaffinity = NULL;
> >> +        }
> >> +
> >> +        if (ccf->cpu_affinity_auto) {
> >> +           setaffinity = NULL;
> >> +        }
> >> +
> >> +        setaffinity = &ccf->cpu_affinity[0];
> >> +
> >> +#else
> >> +
> >> +        setaffinity = NULL;
> >> +
> >> +#endif
> >> +    }
> >> +
> >> +    if (setaffinity)
> >> +           // set new mask
> >> +        sched_setaffinity(setpid, sizeof(ngx_cpuset_t), setaffinity);
> >> +}
> >> +
> >> ngx_cycle_t *
> >> ngx_init_cycle(ngx_cycle_t *old_cycle)
> >> {
> >> @@ -278,6 +312,8 @@
> >>          return NULL;
> >>      }
> >>
> >> +    ngx_change_pid_core(cycle);
> >> +
> >>      if (ngx_test_config && !ngx_quiet_mode) {
> >>          ngx_log_stderr(0, "the configuration file %s syntax is ok",
> >>                         cycle->conf_file.data);
> >>
> > 
> > Sorry, but it is not clear what you are trying to achieve with 
> > this patch.  You may want to provide more details.
> > 
> 
> when we test nginx in kunpeng920 which has 2chip and each chip has 2 NUMA.
> We user 32cores in 2 different NUMA to test nginx, when nginx start the master
> worker runs on which core is undefined, when the master's core and the
> worker's core in the same chip, the performance of non-persistent connection
> is 17W, but when master's core and the worker's core in the different chip,
> the performance of non-persistent connection only has 12W. Now, when nginx
> start, we migrate master process depend on the first worker process's cpu
> affinity, the performance is showed as follow:
>                                                          | default| optimize
> master and worker process on same chip when nginx start  | 171699 | 176020
> master and worker process on diff chip when nginx start  | 129639 | 180637

Ok, so you are trying to bind the master process to the same core 
the first worker process runs on.  Presumably, this can be 
beneficial from performance point of view in configurations with 
small number of worker processes, as various structures allocated 
by the master process after parsing configuration will be 
allocated from the same NUMA region the worker process runs on.  
Correct?

So the following questions are:

0. What units of measurement the numbers use?  Connections per 
second?  What are error margins?

1. How did you tested it?  Given that many configuration 
structures are allocated by the master process during 
configuration parsing, the numbers look strange.  I would expect 
performance with master of worker process on different chips to be 
smaller than that on the same chip, even with the patch applied.  
Well, with error margins we'll probably see there is no difference 
between 176020 and 180637, but this brings another question: where 
the difference between 129639 and 180637 comes from?  Listening 
sockets created by the kernel on the same chip?  So this probably 
means we shouldn't bind worker process in general, but rather 
create listenings sockets on the same chip instead?  Note this is 
not the same, especially with reuseport, not to mention this 
cannot be done at all when we inherit listening sockets form 
previous configurations.

2. What happens when there are multiple worker processes?  Will 
this change still be beneficial, or negative, or neutral?  Don't 
you think the case you are trying to optimize is too narrow to 
care about?

3. In nginx, there are platform-independent functions 
ngx_get_cpu_affinity() and ngx_setaffinity() to work with CPU 
affinity.  Why you are not using them in your patch?  
Additionally, why you are not trying to bind master process to a 
particular CPU with "worker_cpu_affinity auto;"?

-- 
Maxim Dounin
http://mdounin.ru/