[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKwvOdnH_gYv4qRN9pKY7jNTQK95xNeH1w1KZJJmvCkh8xJLBg@mail.gmail.com>
Date: Wed, 11 May 2022 14:35:54 -0700
From: Nick Desaulniers <ndesaulniers@...gle.com>
To: Vincent Mailhol <mailhol.vincent@...adoo.fr>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H . Peter Anvin" <hpa@...or.com>,
Nathan Chancellor <nathan@...nel.org>,
Tom Rix <trix@...hat.com>, linux-kernel@...r.kernel.org,
llvm@...ts.linux.dev, David Howells <dhowells@...hat.com>,
Jan Beulich <JBeulich@...e.com>
Subject: Re: [PATCH v2 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate
constant expressions
On Wed, May 11, 2022 at 9:03 AM Vincent Mailhol
<mailhol.vincent@...adoo.fr> wrote:
>
> For x86_64, the current ffs() implementation does not produce
> optimized code when called with a constant expression. On the
> contrary, the __builtin_ffs() function of both GCC and clang is able
> to simplify the expression into a single instruction.
>
> * Example *
>
> Let's consider two dummy functions foo() and bar() as below:
>
> | #include <linux/bitops.h>
> | #define CONST 0x01000000
> |
> | unsigned int foo(void)
> | {
> | return ffs(CONST);
> | }
> |
> | unsigned int bar(void)
> | {
> | return __builtin_ffs(CONST);
> | }
>
> GCC would produce below assembly code:
Thanks for the patch! LGTM.
Reviewed-by: Nick Desaulniers <ndesaulniers@...gle.com>
>
> | 0000000000000000 <foo>:
> | 0: ba 00 00 00 01 mov $0x1000000,%edx
> | 5: b8 ff ff ff ff mov $0xffffffff,%eax
> | a: 0f bc c2 bsf %edx,%eax
> | d: 83 c0 01 add $0x1,%eax
> | 10: c3 ret
This should be the end of foo. I...actually don't know what's at the
end here. But I don't think the region from here...
> | 11: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
> | 18: 00 00 00 00
> | 1c: 0f 1f 40 00 nopl 0x0(%rax)
...to here is relevant.
> |
> | 0000000000000020 <bar>:
> | 20: b8 19 00 00 00 mov $0x19,%eax
> | 25: c3 ret
>
> And clang would produce:
>
> | 0000000000000000 <foo>:
> | 0: b8 ff ff ff ff mov $0xffffffff,%eax
> | 5: 0f bc 05 00 00 00 00 bsf 0x0(%rip),%eax # c <foo+0xc>
> | c: 83 c0 01 add $0x1,%eax
> | f: c3 ret
Weird, so I just tried this:
```
$ cat /tmp/x.c
#define CONST 0x01000000
unsigned ffs (int x) {
int r;
asm("bsfl %1,%0"
: "=r" (r)
: "rm" (x), "0" (-1));
return r;
}
unsigned int foo(void) {
return ffs(CONST);
}
unsigned int bar(void) {
return __builtin_ffs(CONST);
}
$ clang /tmp/x.c -O2 -o /tmp/x.o -c && llvm-objdump -dr /tmp/x.o
--disassemble-symbols=foo
...
0000000000000010 <foo>:
10: b8 19 00 00 00 movl $25, %eax
15: c3 retq
16: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
```
but if we make `ffs` `static`, we get:
```
0000000000000000 <foo>:
0: b8 ff ff ff ff movl $4294967295, %eax
# imm = 0xFFFFFFFF
5: 0f bc 05 00 00 00 00 bsfl (%rip), %eax
# 0xc <foo+0xc>
0000000000000008: R_X86_64_PC32 .LCPI0_0-0x4
c: c3 retq
d: 0f 1f 00 nopl (%rax)
```
Which is very interesting to me; it looks like constant propagation
actually hurt optimization, we lost that this was a libcall which we
could have optimized.
As in LLVM does:
1. sink CONST into ffs; it's static and has one caller
2. delete x parameter; it's unused
3. now libcall optimization just sees a call to ffs with no params,
that doesn't match the signature of libc.
Your change should fix that since we don't even call a function named
ffs if we have a constant (explicitly, or via optimization). Filed
https://github.com/llvm/llvm-project/issues/55394
--
Thanks,
~Nick Desaulniers
Powered by blists - more mailing lists