[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220511160319.1045812-2-mailhol.vincent@wanadoo.fr>
Date: Thu, 12 May 2022 01:03:18 +0900
From: Vincent Mailhol <mailhol.vincent@...adoo.fr>
To: Nick Desaulniers <ndesaulniers@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H . Peter Anvin" <hpa@...or.com>,
Nathan Chancellor <nathan@...nel.org>,
Tom Rix <trix@...hat.com>, linux-kernel@...r.kernel.org,
llvm@...ts.linux.dev, David Howells <dhowells@...hat.com>,
Jan Beulich <JBeulich@...e.com>,
Vincent Mailhol <mailhol.vincent@...adoo.fr>
Subject: [PATCH v2 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate constant expressions
For x86_64, the current ffs() implementation does not produce
optimized code when called with a constant expression. On the
contrary, the __builtin_ffs() function of both GCC and clang is able
to simplify the expression into a single instruction.
* Example *
Let's consider two dummy functions foo() and bar() as below:
| #include <linux/bitops.h>
| #define CONST 0x01000000
|
| unsigned int foo(void)
| {
| return ffs(CONST);
| }
|
| unsigned int bar(void)
| {
| return __builtin_ffs(CONST);
| }
GCC would produce below assembly code:
| 0000000000000000 <foo>:
| 0: ba 00 00 00 01 mov $0x1000000,%edx
| 5: b8 ff ff ff ff mov $0xffffffff,%eax
| a: 0f bc c2 bsf %edx,%eax
| d: 83 c0 01 add $0x1,%eax
| 10: c3 ret
| 11: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
| 18: 00 00 00 00
| 1c: 0f 1f 40 00 nopl 0x0(%rax)
|
| 0000000000000020 <bar>:
| 20: b8 19 00 00 00 mov $0x19,%eax
| 25: c3 ret
And clang would produce:
| 0000000000000000 <foo>:
| 0: b8 ff ff ff ff mov $0xffffffff,%eax
| 5: 0f bc 05 00 00 00 00 bsf 0x0(%rip),%eax # c <foo+0xc>
| c: 83 c0 01 add $0x1,%eax
| f: c3 ret
|
| 0000000000000010 <bar>:
| 10: b8 19 00 00 00 mov $0x19,%eax
| 15: c3 ret
For both example, we clearly see the benefit of using __builtin_ffs()
instead of the kernel's asm implementation for constant
expressions.
However, for non constant expressions, the ffs() asm version of the
kernel remains better for x86_64 because, contrary to GCC, it doesn't
emit the CMOV assembly instruction, c.f. [1] (noticeably, clang is
able optimize out the CMOV call).
This patch uses the __builtin_constant_p() to select between the
kernel's ffs() and the __builtin_ffs() depending on whether the
argument is constant or not.
As a side benefit, this patch also removes below -Wshadow warning:
| ./arch/x86/include/asm/bitops.h:283:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow]
| 283 | static __always_inline int ffs(int x)
** Statistics **
On a allyesconfig, before applying this patch...:
| $ objdump -d vmlinux.o | grep bsf | wc -l
| 3607
...and after:
| $ objdump -d vmlinux.o | grep bsf | wc -l
| 792
So, roughly 26.7% of the call to ffs() were using constant expression
and were optimized out.
(tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1)
[1] commit ca3d30cc02f7 ("x86_64, asm: Optimise fls(), ffs() and fls64()")
http://lkml.kernel.org/r/20111213145654.14362.39868.stgit@warthog.procyon.org.uk
CC: Nick Desaulniers <ndesaulniers@...gle.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@...adoo.fr>
---
arch/x86/include/asm/bitops.h | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index a288ecd230ab..6ed979547086 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -269,18 +269,7 @@ static __always_inline unsigned long __fls(unsigned long word)
#undef ADDR
#ifdef __KERNEL__
-/**
- * ffs - find first set bit in word
- * @x: the word to search
- *
- * This is defined the same way as the libc and compiler builtin ffs
- * routines, therefore differs in spirit from the other bitops.
- *
- * ffs(value) returns 0 if value is 0 or the position of the first
- * set bit if value is nonzero. The first (least significant) bit
- * is at position 1.
- */
-static __always_inline int ffs(int x)
+static __always_inline int variable_ffs(int x)
{
int r;
@@ -310,6 +299,19 @@ static __always_inline int ffs(int x)
return r + 1;
}
+/**
+ * ffs - find first set bit in word
+ * @x: the word to search
+ *
+ * This is defined the same way as the libc and compiler builtin ffs
+ * routines, therefore differs in spirit from the other bitops.
+ *
+ * ffs(value) returns 0 if value is 0 or the position of the first
+ * set bit if value is nonzero. The first (least significant) bit
+ * is at position 1.
+ */
+#define ffs(x) (__builtin_constant_p(x) ? __builtin_ffs(x) : variable_ffs(x))
+
/**
* fls - find last set bit in word
* @x: the word to search
--
2.35.1
Powered by blists - more mailing lists