[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7e0c2b99-00c1-4e64-ac68-50ba7500fd20@citrix.com>
Date: Tue, 29 Apr 2025 23:22:18 +0100
From: Andrew Cooper <andrew.cooper3@...rix.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>,
Arnd Bergmann <arnd@...db.de>, Arnd Bergmann <arnd@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
x86@...nel.org, Juergen Gross <jgross@...e.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Alexander Usyskin <alexander.usyskin@...el.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Mateusz Jończyk <mat.jonczyk@...pl>,
Mike Rapoport <rppt@...nel.org>, Ard Biesheuvel <ardb@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
xen-devel@...ts.xenproject.org
Subject: Re: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case
handling to C
On 29/04/2025 11:04 pm, Linus Torvalds wrote:
> On Tue, 29 Apr 2025 at 14:59, Andrew Cooper <andrew.cooper3@...rix.com> wrote:
>> do_variable_ffs() doesn't quite work.
>>
>> REP BSF is LZCNT, and unconditionally writes it's output operand, and
>> defeats the attempt to preload with -1.
>>
>> Drop the REP prefix, and it should work as intended.
> Bah. That's what I get for just doing it blindly without actually
> looking at the kernel source. I just copied the __ffs() thing - and
> there the 'rep' is not for the zero case - which we don't care about -
> but because lzcnt performs better on newer CPUs.
Oh, I didn't realise there was also a perf difference too, but Agner Fog
agrees.
Apparently in Zen4, BSF and friends have become a single uop with a
sensible latency. Previously they were 6-8 uops with a latency to match.
Intel appear to have have had them as a single uop since SandyBridge, so
quite a long time now.
~Andrew
Powered by blists - more mailing lists