[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z8l7KeVvvHvmPmRc@gmail.com>
Date: Thu, 6 Mar 2025 11:38:33 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Uros Bizjak <ubizjak@...il.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic
locking insns
* Uros Bizjak <ubizjak@...il.com> wrote:
> > So these are roughly the high level requirements around such patches.
> > Does this make sense?
>
> In my opinion, the writing above makes perfect sense. As far as I'm
> concerned, I'm able and can do the above code analysis, the
> problematic part would be precise performance measurements. Although
> with your instructions, I can also try that.
Yeah, so *personally* I find the kind of code generation analysis you
routinely perform for your micro-optimization patches far more useful
and persuasive, because it's basically a first principles argument:
instructions removed are an inarguable positive in the overwhelming
majority cases all other things equal (as long as it doesn't come at
the expense of more function calls or worse instructions, etc.).
For inlining decisions code generation analysis is arguably more
complicated - but that's the nature of inlining related patches.
Performance measurements can back up such arguments, and being more
proficient in perf tooling is a useful toolbox to have anyway, but it's
fundamentally a stohastic argument for something as comparatively small
as a +0.12% code size increase.
But if code generation analysis is inconclusive or even negative, then
performance measurements can trump all of that, but it's a substantial
barrier of entry as you noted - and I'm somewhat sceptical whether a
0.12% code generation effect *can* even be measured reliably even with
the best of expertise and infrastructure...
Also, to shorten build & test times you can use the x86-64 defconfig.
It's a config more or less representative of what major distros enable,
and it's even bootable on some systems and in VMs, but it builds in far
less time.
Thanks,
Ingo
Powered by blists - more mailing lists