linux-kernel - Re: [PATCH -tip] x86/locking/atomic: Use asm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z8l7KeVvvHvmPmRc@gmail.com>
Date: Thu, 6 Mar 2025 11:38:33 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Uros Bizjak <ubizjak@...il.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic
 locking insns

* Uros Bizjak <ubizjak@...il.com> wrote:

> > So these are roughly the high level requirements around such patches.
> > Does this make sense?
> 
> In my opinion, the writing above makes perfect sense. As far as I'm 
> concerned, I'm able and can do the above code analysis, the 
> problematic part would be precise performance measurements. Although 
> with your instructions, I can also try that.

Yeah, so *personally* I find the kind of code generation analysis you 
routinely perform for your micro-optimization patches far more useful 
and persuasive, because it's basically a first principles argument: 
instructions removed are an inarguable positive in the overwhelming 
majority cases all other things equal (as long as it doesn't come at 
the expense of more function calls or worse instructions, etc.).

For inlining decisions code generation analysis is arguably more 
complicated - but that's the nature of inlining related patches.

Performance measurements can back up such arguments, and being more 
proficient in perf tooling is a useful toolbox to have anyway, but it's 
fundamentally a stohastic argument for something as comparatively small 
as a +0.12% code size increase.

But if code generation analysis is inconclusive or even negative, then 
performance measurements can trump all of that, but it's a substantial 
barrier of entry as you noted - and I'm somewhat sceptical whether a 
0.12% code generation effect *can* even be measured reliably even with 
the best of expertise and infrastructure...

Also, to shorten build & test times you can use the x86-64 defconfig. 
It's a config more or less representative of what major distros enable, 
and it's even bootable on some systems and in VMs, but it builds in far 
less time.

Thanks,

	Ingo