[<prev] [next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.2601111836250.30566@angie.orcam.me.uk>
Date: Sun, 11 Jan 2026 21:21:57 +0000 (GMT)
From: "Maciej W. Rozycki" <macro@...am.me.uk>
To: Andrew Morton <akpm@...ux-foundation.org>, Jens Axboe <axboe@...nel.dk>,
John Garry <john.g.garry@...cle.com>, Su Hui <suhui@...china.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>
cc: David Laight <david.laight.linux@...il.com>, linux-kernel@...r.kernel.org
Subject: [PATCH] linux/log2.h: Reduce instruction count for is_power_of_2()
Follow an observation that (n ^ (n - 1)) will only ever retain the most
significant bit set in the word operated on if that is the only bit set
in the first place, and use it to determine whether a number is a whole
power of 2, avoiding the need for an explicit check for nonzero.
This reduces the sequence produced to 3 instructions only across Alpha,
MIPS, and RISC-V targets, down from 4, 5, and 4 respectively, removing a
branch in the two latter cases. And it's 5 instructions on POWER and
x86-64 vs 8 and 9 respectively. There are no branches now emitted here
for targets that have a suitable conditional set operation, although an
inline expansion will often end with one, depending on what code a call
to this function is used in.
Credit goes to GCC authors for coming up with this optimisation used as
the fallback for (__builtin_popcountl(n) == 1), equivalent to this code,
for targets where the hardware population count operation is considered
expensive.
Signed-off-by: Maciej W. Rozycki <macro@...am.me.uk>
---
Hi,
As discussed here[1] a further improvement might be possible with targets
that support a population count operation in hardware, but care needs to
be taken for a libcall not to be produced instead, so such code would have
to be conditionalised on the presence of a population count instruction
and its correct handling in the compiler (which GCC gets right for POWER,
but wrong for Alpha, apparently owing to an incorrect cost taken for the
operation).
Anyway, this seems like worthwhile if small an improvement regardless, so
please apply.
References:
[1] "treewide, bits: use ffs_val() where it is open-coded",
<https://lore.kernel.org/r/alpine.DEB.2.21.2601111450500.30566@angie.orcam.me.uk/>.
Maciej
---
include/linux/log2.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
linux-log2-pow2-opt.diff
Index: linux-macro/include/linux/log2.h
===================================================================
--- linux-macro.orig/include/linux/log2.h
+++ linux-macro/include/linux/log2.h
@@ -44,7 +44,7 @@ int __ilog2_u64(u64 n)
static __always_inline __attribute__((const))
bool is_power_of_2(unsigned long n)
{
- return (n != 0 && ((n & (n - 1)) == 0));
+ return n - 1 < (n ^ (n - 1));
}
/**
Powered by blists - more mailing lists