[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220510142550.1686866-2-mailhol.vincent@wanadoo.fr>
Date: Tue, 10 May 2022 23:25:49 +0900
From: Vincent Mailhol <mailhol.vincent@...adoo.fr>
To: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H . Peter Anvin" <hpa@...or.com>,
Nathan Chancellor <nathan@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Tom Rix <trix@...hat.com>, linux-kernel@...r.kernel.org,
llvm@...ts.linux.dev, Vincent Mailhol <mailhol.vincent@...adoo.fr>
Subject: [PATCH 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate constant expressions
For x86_64, the current ffs() implementation does not produce
optimized code when called with a constant expression. On the
contrary, the __builtin_ffs() function of both GCC and clang is able
to simplify the expression into a single instruction.
* Example *
Let's consider two dummy functions foo() and bar() as below:
| #include <linux/bitops.h>
| #define CONST 0x01000000
|
| unsigned int foo(void)
| {
| return ffs(CONST);
| }
|
| unsigned int bar(void)
| {
| return __builtin_ffs(CONST);
| }
GCC would produce below assembly code:
| 0000000000000000 <foo>:
| 0: b8 ff ff ff ff mov $0xffffffff,%eax
| 5: 0f bc c7 bsf %edi,%eax
| 8: 83 c0 01 add $0x1,%eax
| b: c3 ret
| c: 0f 1f 40 00 nopl 0x0(%rax)
|
| 0000000000000010 <bar>:
| 10: b8 19 00 00 00 mov $0x19,%eax
| 15: c3 ret
And clang would produce:
| 0000000000000000 <foo>:
| 0: 55 push %rbp
| 1: 48 89 e5 mov %rsp,%rbp
| 4: b8 ff ff ff ff mov $0xffffffff,%eax
| 9: 0f bc 05 00 00 00 00 bsf 0x0(%rip),%eax # 10 <foo+0x10>
| 10: ff c0 inc %eax
| 12: 5d pop %rbp
| 13: c3 ret
| 14: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
| 1b: 00 00 00
| 1e: 66 90 xchg %ax,%ax
|
| 0000000000000020 <bar>:
| 20: 55 push %rbp
| 21: 48 89 e5 mov %rsp,%rbp
| 24: b8 19 00 00 00 mov $0x19,%eax
| 29: 5d pop %rbp
| 2a: c3 ret
For both examples, we clearly see the benefit of using __builtin_ffs()
instead of the kernel's asm implementation for constant expressions.
However, for non constant expressions, the ffs() asm version of the
kernel remains better for x86_64 because, contrary to GCC, it doesn't
emit the CMOV assembly instruction, c.f. [1] (noticeably, clang is
able optimize out the CMOV call).
This patch uses the __builtin_constant_p() to select between the
kernel's ffs() and the __builtin_ffs() depending on whether the
argument is constant or not.
As a side benefit, this patch also removes below -Wshadow warning:
| ./arch/x86/include/asm/bitops.h:283:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow]
| 283 | static __always_inline int ffs(int x)
[1] commit ca3d30cc02f7 ("x86_64, asm: Optimise fls(), ffs() and fls64()")
http://lkml.kernel.org/r/20111213145654.14362.39868.stgit@warthog.procyon.org.uk
Signed-off-by: Vincent Mailhol <mailhol.vincent@...adoo.fr>
---
arch/x86/include/asm/bitops.h | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index a288ecd230ab..535a7a358c14 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -269,18 +269,7 @@ static __always_inline unsigned long __fls(unsigned long word)
#undef ADDR
#ifdef __KERNEL__
-/**
- * ffs - find first set bit in word
- * @x: the word to search
- *
- * This is defined the same way as the libc and compiler builtin ffs
- * routines, therefore differs in spirit from the other bitops.
- *
- * ffs(value) returns 0 if value is 0 or the position of the first
- * set bit if value is nonzero. The first (least significant) bit
- * is at position 1.
- */
-static __always_inline int ffs(int x)
+static __always_inline int __ffs_asm(int x)
{
int r;
@@ -310,6 +299,22 @@ static __always_inline int ffs(int x)
return r + 1;
}
+/**
+ * ffs - find first set bit in word
+ * @x: the word to search
+ *
+ * This is defined the same way as the libc and compiler builtin ffs
+ * routines, therefore differs in spirit from the other bitops.
+ *
+ * ffs(value) returns 0 if value is 0 or the position of the first
+ * set bit if value is nonzero. The first (least significant) bit
+ * is at position 1.
+ */
+#define ffs(x) \
+ (__builtin_constant_p(x) ? \
+ __builtin_ffs(x) : \
+ __ffs_asm(x))
+
/**
* fls - find last set bit in word
* @x: the word to search
--
2.35.1
Powered by blists - more mailing lists