[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aA8oqKUaFU-0wb-D@gmail.com>
Date: Mon, 28 Apr 2025 09:05:12 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andrew Cooper <andrew.cooper3@...rix.com>,
Arnd Bergmann <arnd@...db.de>, Arnd Bergmann <arnd@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>, Juergen Gross <jgross@...e.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Alexander Usyskin <alexander.usyskin@...el.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Mateusz Jończyk <mat.jonczyk@...pl>,
Mike Rapoport <rppt@...nel.org>, Ard Biesheuvel <ardb@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
xen-devel@...ts.xenproject.org
Subject: Re: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case
handling to C
* Ingo Molnar <mingo@...nel.org> wrote:
> > UNTESTED patch applied in case somebody wants to play with this. It
> > removes 10 lines of silly code, and along with them that 'cmov' use.
> >
> > Anybody?
>
> Makes sense - it seems to boot here, but I only did some very light
> testing.
>
> There's a minor text size increase on x86-32 defconfig, GCC 14.2.0:
>
> text data bss dec hex filename
> 16577728 7598826 1744896 25921450 18b87aa vmlinux.before
> 16577908 7598838 1744896 25921642 18b886a vmlinux.after
>
> bloatometer output:
>
> add/remove: 2/1 grow/shrink: 201/189 up/down: 5681/-3486 (2195)
And once we remove 486, I think we can do the optimization below to
just assume the output doesn't get clobbered by BS*L in the zero-case,
right?
In the text size space it's a substantial optimization on x86-32
defconfig:
text data bss dec hex filename
16,577,728 7598826 1744896 25921450 18b87aa vmlinux.vanilla # CMOV+BS*L
16,577,908 7598838 1744896 25921642 18b886a vmlinux.linus_patch # if()+BS*L
16,573,568 7602922 1744896 25921386 18b876a vmlinux.noclobber # BS*L
Thanks,
Ingo
---
arch/x86/include/asm/bitops.h | 20 ++------------------
1 file changed, 2 insertions(+), 18 deletions(-)
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 6061c87f14ac..e3e94a806656 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -308,24 +308,16 @@ static __always_inline int variable_ffs(int x)
{
int r;
-#ifdef CONFIG_X86_64
/*
* AMD64 says BSFL won't clobber the dest reg if x==0; Intel64 says the
* dest reg is undefined if x==0, but their CPU architect says its
* value is written to set it to the same as before, except that the
* top 32 bits will be cleared.
- *
- * We cannot do this on 32 bits because at the very least some
- * 486 CPUs did not behave this way.
*/
asm("bsfl %1,%0"
: "=r" (r)
: ASM_INPUT_RM (x), "0" (-1));
-#else
- if (!x)
- return 0;
- asm("bsfl %1,%0" : "=r" (r) : "rm" (x));
-#endif
+
return r + 1;
}
@@ -360,24 +352,16 @@ static __always_inline int fls(unsigned int x)
if (__builtin_constant_p(x))
return x ? 32 - __builtin_clz(x) : 0;
-#ifdef CONFIG_X86_64
/*
* AMD64 says BSRL won't clobber the dest reg if x==0; Intel64 says the
* dest reg is undefined if x==0, but their CPU architect says its
* value is written to set it to the same as before, except that the
* top 32 bits will be cleared.
- *
- * We cannot do this on 32 bits because at the very least some
- * 486 CPUs did not behave this way.
*/
asm("bsrl %1,%0"
: "=r" (r)
: ASM_INPUT_RM (x), "0" (-1));
-#else
- if (!x)
- return 0;
- asm("bsrl %1,%0" : "=r" (r) : "rm" (x));
-#endif
+
return r + 1;
}
Powered by blists - more mailing lists