lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z8l7KeVvvHvmPmRc@gmail.com>
Date: Thu, 6 Mar 2025 11:38:33 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Uros Bizjak <ubizjak@...il.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic
 locking insns


* Uros Bizjak <ubizjak@...il.com> wrote:

> > So these are roughly the high level requirements around such patches.
> > Does this make sense?
> 
> In my opinion, the writing above makes perfect sense. As far as I'm 
> concerned, I'm able and can do the above code analysis, the 
> problematic part would be precise performance measurements. Although 
> with your instructions, I can also try that.

Yeah, so *personally* I find the kind of code generation analysis you 
routinely perform for your micro-optimization patches far more useful 
and persuasive, because it's basically a first principles argument: 
instructions removed are an inarguable positive in the overwhelming 
majority cases all other things equal (as long as it doesn't come at 
the expense of more function calls or worse instructions, etc.).

For inlining decisions code generation analysis is arguably more 
complicated - but that's the nature of inlining related patches.

Performance measurements can back up such arguments, and being more 
proficient in perf tooling is a useful toolbox to have anyway, but it's 
fundamentally a stohastic argument for something as comparatively small 
as a +0.12% code size increase.

But if code generation analysis is inconclusive or even negative, then 
performance measurements can trump all of that, but it's a substantial 
barrier of entry as you noted - and I'm somewhat sceptical whether a 
0.12% code generation effect *can* even be measured reliably even with 
the best of expertise and infrastructure...

Also, to shorten build & test times you can use the x86-64 defconfig. 
It's a config more or less representative of what major distros enable, 
and it's even bootable on some systems and in VMs, but it builds in far 
less time.

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ