linux-kernel - Re: [PATCH -tip] x86/locking/atomic: Use asm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250305201429.3026ba9f@pumpkin>
Date: Wed, 5 Mar 2025 20:14:29 +0000
From: David Laight <david.laight.linux@...il.com>
To: Linus Torvalds <torvalds@...uxfoundation.org>
Cc: Uros Bizjak <ubizjak@...il.com>, Borislav Petkov <bp@...en8.de>, Dave
 Hansen <dave.hansen@...el.com>, x86@...nel.org,
 linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>, Thomas
 Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...nel.org>, Dave Hansen
 <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic
 locking insns

On Wed, 5 Mar 2025 07:04:08 -1000
Linus Torvalds <torvalds@...uxfoundation.org> wrote:

> On Tue, 4 Mar 2025 at 22:54, Uros Bizjak <ubizjak@...il.com> wrote:
> >
> > Even to my surprise, the patch has some noticeable effects on the
> > performance, please see the attachment in [1] for LMBench data or [2]
> > for some excerpts from the data. So, I think the patch has potential
> > to improve the performance.  
> 
> I suspect some of the performance difference - which looks
> unexpectedly large - is due to having run them on a CPU with the
> horrendous indirect return costs, and then inlining can make a huge
> difference.
...

Another possibility is that the processes are getting bounced around
cpu in a slightly different way.
An idle cpu might be running at 800MHz, run something that spins on it
and the clock speed will soon jump to 4GHz.
But if your 'spinning' process is migrated to a different cpu it starts
again at 800MHz.

(I had something where a fpga compile when from 12 mins to over 20 because
the kernel RSB stuffing caused the scheduler to behave differently even
though nothing was doing a lot of system calls.)

All sorts of things can affect that - possibly even making some code faster!

The (IIRC) 30k increase in code size will be a few functions being inlined.
The bloat-o-meter might show which, and forcing a few inlines the same way
should reduce that difference.
OTOH I'm surprised that a single (or two) instruction makes that much
difference - unless gcc is managing to discard the size of the entire
function rather than just the asm block itself.

Benchmarking on modern cpu is hard.
You really do need to lock the cpu frequencies - and that may not be supported.

	David