[PATCH] Replaced loop with __builtin_ctzl for detecting first 0 in bitnamp
Maxim Dounin
mdounin at mdounin.ru
Fri Nov 27 20:16:39 UTC 2020
Hello!
On Fri, Nov 27, 2020 at 01:22:46PM -0500, goldstein.w.n at gmail.com wrote:
> # HG changeset patch
> # User Noah Goldstein <goldstein.w.n at gmail.com>
> # Date 1606497081 18000
> # Fri Nov 27 12:11:21 2020 -0500
> # Node ID 7ec2fc7b29d6614df28152dd4a895e6139138890
> # Parent 90cc7194e993f8d722347e9f46a00f65dffc3935
> Replaced loop with __builtin_ctzl for detecting first 0 in bitnamp
> No particular pressing reason for this change other than the performance benefit it yields. Converts a loop that could take 63 iterations (and had a branch) to 5 instructions
>
> diff -r 90cc7194e993 -r 7ec2fc7b29d6 src/core/ngx_slab.c
> --- a/src/core/ngx_slab.c Fri Nov 27 00:01:20 2020 +0300
> +++ b/src/core/ngx_slab.c Fri Nov 27 12:11:21 2020 -0500
> @@ -235,36 +235,33 @@
>
> if (bitmap[n] != NGX_SLAB_BUSY) {
>
> - for (m = 1, i = 0; m; m <<= 1, i++) {
> - if (bitmap[n] & m) {
> - continue;
> + m = ~bitmap[n];
> + i = __builtin_ctzl(m);
Thank you for the patch.
Unfortunately, __builtin_ctzl() is not portable and hence the
patch cannot be committed for obvious reasons. Further, even if
__builtin_ctzl() is available, there no guarantees that it can be
used on an uintptr_t variable, as uintptr_t can be larger than
long (notably, 64bit Windows uses LLP64 data model).
Also note that there are other similar loops in the various places
of the code, and changing just one would certainly confuse
readers.
Note well that slab allocations are expected to be rare, and this
loop is not expected to be performance-critical. Most performance
impact on slab allocations are expected to be from shared mutex
locking.
If you have practical reasons to assume this code needs to be
optimized, please share the details. If it indeed needs to, we
can consider making a portable solution.
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx-devel
mailing list