lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220510142550.1686866-2-mailhol.vincent@wanadoo.fr>
Date:   Tue, 10 May 2022 23:25:49 +0900
From:   Vincent Mailhol <mailhol.vincent@...adoo.fr>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>
Cc:     Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H . Peter Anvin" <hpa@...or.com>,
        Nathan Chancellor <nathan@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Tom Rix <trix@...hat.com>, linux-kernel@...r.kernel.org,
        llvm@...ts.linux.dev, Vincent Mailhol <mailhol.vincent@...adoo.fr>
Subject: [PATCH 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate constant expressions

For x86_64, the current ffs() implementation does not produce
optimized code when called with a constant expression. On the
contrary, the __builtin_ffs() function of both GCC and clang is able
to simplify the expression into a single instruction.

* Example *

Let's consider two dummy functions foo() and bar() as below:

| #include <linux/bitops.h>
| #define CONST 0x01000000
|
| unsigned int foo(void)
| {
| 	return ffs(CONST);
| }
|
| unsigned int bar(void)
| {
| 	return __builtin_ffs(CONST);
| }

GCC would produce below assembly code:

| 0000000000000000 <foo>:
|    0:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
|    5:	0f bc c7             	bsf    %edi,%eax
|    8:	83 c0 01             	add    $0x1,%eax
|    b:	c3                   	ret
|    c:	0f 1f 40 00          	nopl   0x0(%rax)
|
| 0000000000000010 <bar>:
|   10:	b8 19 00 00 00       	mov    $0x19,%eax
|   15:	c3                   	ret

And clang would produce:

| 0000000000000000 <foo>:
|    0:	55                   	push   %rbp
|    1:	48 89 e5             	mov    %rsp,%rbp
|    4:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
|    9:	0f bc 05 00 00 00 00 	bsf    0x0(%rip),%eax        # 10 <foo+0x10>
|   10:	ff c0                	inc    %eax
|   12:	5d                   	pop    %rbp
|   13:	c3                   	ret
|   14:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
|   1b:	00 00 00
|   1e:	66 90                	xchg   %ax,%ax
|
| 0000000000000020 <bar>:
|   20:	55                   	push   %rbp
|   21:	48 89 e5             	mov    %rsp,%rbp
|   24:	b8 19 00 00 00       	mov    $0x19,%eax
|   29:	5d                   	pop    %rbp
|   2a:	c3                   	ret

For both examples, we clearly see the benefit of using __builtin_ffs()
instead of the kernel's asm implementation for constant expressions.

However, for non constant expressions, the ffs() asm version of the
kernel remains better for x86_64 because, contrary to GCC, it doesn't
emit the CMOV assembly instruction, c.f. [1] (noticeably, clang is
able optimize out the CMOV call).

This patch uses the __builtin_constant_p() to select between the
kernel's ffs() and the __builtin_ffs() depending on whether the
argument is constant or not.


As a side benefit, this patch also removes below -Wshadow warning:

| ./arch/x86/include/asm/bitops.h:283:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow]
|   283 | static __always_inline int ffs(int x)

[1] commit ca3d30cc02f7 ("x86_64, asm: Optimise fls(), ffs() and fls64()")
http://lkml.kernel.org/r/20111213145654.14362.39868.stgit@warthog.procyon.org.uk


Signed-off-by: Vincent Mailhol <mailhol.vincent@...adoo.fr>
---
 arch/x86/include/asm/bitops.h | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index a288ecd230ab..535a7a358c14 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -269,18 +269,7 @@ static __always_inline unsigned long __fls(unsigned long word)
 #undef ADDR
 
 #ifdef __KERNEL__
-/**
- * ffs - find first set bit in word
- * @x: the word to search
- *
- * This is defined the same way as the libc and compiler builtin ffs
- * routines, therefore differs in spirit from the other bitops.
- *
- * ffs(value) returns 0 if value is 0 or the position of the first
- * set bit if value is nonzero. The first (least significant) bit
- * is at position 1.
- */
-static __always_inline int ffs(int x)
+static __always_inline int __ffs_asm(int x)
 {
 	int r;
 
@@ -310,6 +299,22 @@ static __always_inline int ffs(int x)
 	return r + 1;
 }
 
+/**
+ * ffs - find first set bit in word
+ * @x: the word to search
+ *
+ * This is defined the same way as the libc and compiler builtin ffs
+ * routines, therefore differs in spirit from the other bitops.
+ *
+ * ffs(value) returns 0 if value is 0 or the position of the first
+ * set bit if value is nonzero. The first (least significant) bit
+ * is at position 1.
+ */
+#define ffs(x)					\
+	 (__builtin_constant_p(x) ?		\
+	  __builtin_ffs(x) :			\
+	  __ffs_asm(x))
+
 /**
  * fls - find last set bit in word
  * @x: the word to search
-- 
2.35.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ