[PATCH] Optimal performance when use http non-persistent connection
Maxim Dounin
mdounin at mdounin.ru
Thu Nov 21 15:21:47 UTC 2019
Hello!
On Thu, Nov 21, 2019 at 07:22:16PM +0800, Shaokun Zhang wrote:
> Hi Maixm,
>
> On 2019/11/20 22:29, Maxim Dounin wrote:
> > Hello!
> >
> > On Mon, Nov 11, 2019 at 03:07:02AM +0000, Zhangshaokun wrote:
> >
> >> # HG changeset patch
> >> # User Rui Sun <sunrui26 at huawei.com<mailto:sunrui26 at huawei.com>>
> >> # Date 1572848389 -28800
> >> # Mon Nov 04 14:19:49 2019 +0800
> >> # Branch local
> >> # Node ID a5ae6e9e99f747fcb45082bac8795622938184f1
> >> # Parent 89adf49fe76ada86d84e2af8f5cee9ca8c3dca19
> >> Optimal performance when use http non-persistent connection
> >>
> >> diff -r 89adf49fe76a -r a5ae6e9e99f7 src/core/ngx_cycle.c
> >> --- a/src/core/ngx_cycle.c Mon Oct 21 20:22:30 2019 +0300
> >> +++ b/src/core/ngx_cycle.c Mon Nov 04 14:19:49 2019 +0800
> >> @@ -35,6 +35,40 @@
> >> /* STUB */
> >>
> >>
> >> +void
> >> +ngx_change_pid_core(ngx_cycle_t *cycle)
> >> +{
> >> + ngx_pid_t setpid;
> >> + ngx_cpuset_t *setaffinity=NULL;
> >> + setpid = ngx_getpid();
> >> + {
> >> +#if (NGX_HAVE_CPU_AFFINITY)
> >> + ngx_core_conf_t *ccf;
> >> +
> >> + ccf = (ngx_core_conf_t *) ngx_get_conf(cycle->conf_ctx, ngx_core_module);
> >> +
> >> + if (ccf->cpu_affinity == NULL) {
> >> + setaffinity = NULL;
> >> + }
> >> +
> >> + if (ccf->cpu_affinity_auto) {
> >> + setaffinity = NULL;
> >> + }
> >> +
> >> + setaffinity = &ccf->cpu_affinity[0];
> >> +
> >> +#else
> >> +
> >> + setaffinity = NULL;
> >> +
> >> +#endif
> >> + }
> >> +
> >> + if (setaffinity)
> >> + // set new mask
> >> + sched_setaffinity(setpid, sizeof(ngx_cpuset_t), setaffinity);
> >> +}
> >> +
> >> ngx_cycle_t *
> >> ngx_init_cycle(ngx_cycle_t *old_cycle)
> >> {
> >> @@ -278,6 +312,8 @@
> >> return NULL;
> >> }
> >>
> >> + ngx_change_pid_core(cycle);
> >> +
> >> if (ngx_test_config && !ngx_quiet_mode) {
> >> ngx_log_stderr(0, "the configuration file %s syntax is ok",
> >> cycle->conf_file.data);
> >>
> >
> > Sorry, but it is not clear what you are trying to achieve with
> > this patch. You may want to provide more details.
> >
>
> when we test nginx in kunpeng920 which has 2chip and each chip has 2 NUMA.
> We user 32cores in 2 different NUMA to test nginx, when nginx start the master
> worker runs on which core is undefined, when the master's core and the
> worker's core in the same chip, the performance of non-persistent connection
> is 17W, but when master's core and the worker's core in the different chip,
> the performance of non-persistent connection only has 12W. Now, when nginx
> start, we migrate master process depend on the first worker process's cpu
> affinity, the performance is showed as follow:
> | default| optimize
> master and worker process on same chip when nginx start | 171699 | 176020
> master and worker process on diff chip when nginx start | 129639 | 180637
Ok, so you are trying to bind the master process to the same core
the first worker process runs on. Presumably, this can be
beneficial from performance point of view in configurations with
small number of worker processes, as various structures allocated
by the master process after parsing configuration will be
allocated from the same NUMA region the worker process runs on.
Correct?
So the following questions are:
0. What units of measurement the numbers use? Connections per
second? What are error margins?
1. How did you tested it? Given that many configuration
structures are allocated by the master process during
configuration parsing, the numbers look strange. I would expect
performance with master of worker process on different chips to be
smaller than that on the same chip, even with the patch applied.
Well, with error margins we'll probably see there is no difference
between 176020 and 180637, but this brings another question: where
the difference between 129639 and 180637 comes from? Listening
sockets created by the kernel on the same chip? So this probably
means we shouldn't bind worker process in general, but rather
create listenings sockets on the same chip instead? Note this is
not the same, especially with reuseport, not to mention this
cannot be done at all when we inherit listening sockets form
previous configurations.
2. What happens when there are multiple worker processes? Will
this change still be beneficial, or negative, or neutral? Don't
you think the case you are trying to optimize is too narrow to
care about?
3. In nginx, there are platform-independent functions
ngx_get_cpu_affinity() and ngx_setaffinity() to work with CPU
affinity. Why you are not using them in your patch?
Additionally, why you are not trying to bind master process to a
particular CPU with "worker_cpu_affinity auto;"?
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx-devel
mailing list