FW: [PATCH] Using worker specific counters for http_stub_status module

Tue Jan 9 19:08:41 UTC 2018

Hello,

I had this observation while benchmarking nginx 48 workers with wrk on two separate back to back high speed connected systems (arm) with several random files being accessed from the client.
As you rightly mentioned this may not have impacted performance any real time workload in any significant way - as has been observed during benchmarking.
However if it's easy to avoid shared memory contention - it may make sense to avoid it - as it might have a negative impact on some platforms under peak loads.

Also in the code the counter slot size was kept to 128 with a comment like - keep equal to or more than CL size.
Does it make sense to keep it to ngx_cacheline_size rather than hardcoding it to a largest CL size ?

Thanks
Debayan

-----Original Message-----
From: Maxim Dounin [mailto:mdounin at mdounin.ru]
Sent: Sunday, January 7, 2018 8:44 PM
To: nginx-devel at nginx.org
Cc: debayang.qdt <debayang.qdt at qualcommdatacenter.com>
Subject: Re: [PATCH] Using worker specific counters for http_stub_status module

Hello!

On Fri, Jan 05, 2018 at 01:34:56PM +0000, debayang.qdt wrote:

> When the http_stub_status_module is enabled, a performance impact seen 
> on some platforms with several worker processes running and increased 
> workload.
> There is a contention with the atomic updates of the several shared 
> memory counters maintained by this module - which could be eliminated 
> if we maintain worker process specific counters and only sum them up 
> when requested by client.
> 
> Below patch is an attempt to do so - which bypasses the contention and 
> improves performance on such platforms.

So far we haven't seen any noticeable performance degradation on real workloads due to stub status enabled.  Several atomic increments aren't visible compared to generic request processing costs.  If you've seen any noticeable preformance degradation on real workloads - you may want to share your observations first.

Also, it might not be a good idea to spend 128k per variable, as this might be noticeable on platforms with small amounts of memory.

--
Maxim Dounin
http://mdounin.ru/