lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250305201429.3026ba9f@pumpkin>
Date: Wed, 5 Mar 2025 20:14:29 +0000
From: David Laight <david.laight.linux@...il.com>
To: Linus Torvalds <torvalds@...uxfoundation.org>
Cc: Uros Bizjak <ubizjak@...il.com>, Borislav Petkov <bp@...en8.de>, Dave
 Hansen <dave.hansen@...el.com>, x86@...nel.org,
 linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>, Thomas
 Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...nel.org>, Dave Hansen
 <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic
 locking insns

On Wed, 5 Mar 2025 07:04:08 -1000
Linus Torvalds <torvalds@...uxfoundation.org> wrote:

> On Tue, 4 Mar 2025 at 22:54, Uros Bizjak <ubizjak@...il.com> wrote:
> >
> > Even to my surprise, the patch has some noticeable effects on the
> > performance, please see the attachment in [1] for LMBench data or [2]
> > for some excerpts from the data. So, I think the patch has potential
> > to improve the performance.  
> 
> I suspect some of the performance difference - which looks
> unexpectedly large - is due to having run them on a CPU with the
> horrendous indirect return costs, and then inlining can make a huge
> difference.
...

Another possibility is that the processes are getting bounced around
cpu in a slightly different way.
An idle cpu might be running at 800MHz, run something that spins on it
and the clock speed will soon jump to 4GHz.
But if your 'spinning' process is migrated to a different cpu it starts
again at 800MHz.

(I had something where a fpga compile when from 12 mins to over 20 because
the kernel RSB stuffing caused the scheduler to behave differently even
though nothing was doing a lot of system calls.)

All sorts of things can affect that - possibly even making some code faster!

The (IIRC) 30k increase in code size will be a few functions being inlined.
The bloat-o-meter might show which, and forcing a few inlines the same way
should reduce that difference.
OTOH I'm surprised that a single (or two) instruction makes that much
difference - unless gcc is managing to discard the size of the entire
function rather than just the asm block itself.

Benchmarking on modern cpu is hard.
You really do need to lock the cpu frequencies - and that may not be supported.

	David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ