[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250303131234.0a2e20e4@pumpkin>
Date: Mon, 3 Mar 2025 13:12:34 +0000
From: David Laight <david.laight.linux@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Uros Bizjak <ubizjak@...il.com>, x86@...nel.org,
linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>, Thomas
Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...nel.org>, Borislav
Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter
Anvin" <hpa@...or.com>
Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic
locking insns
On Fri, 28 Feb 2025 14:58:47 -0800
Dave Hansen <dave.hansen@...el.com> wrote:
> On 2/28/25 14:31, Uros Bizjak wrote:
> > On Fri, Feb 28, 2025 at 5:48 PM Dave Hansen <dave.hansen@...el.com> wrote:
> >>
> >> On 2/28/25 04:35, Uros Bizjak wrote:
> >>> The code size of the resulting x86_64 defconfig object file increases
> >>> for 33.264 kbytes, representing 1.2% code size increase:
> >>>
> >>> text data bss dec hex filename
> >>> 27450107 4633332 814148 32897587 1f5fa33 vmlinux-old.o
> >>> 27483371 4633784 814148 32931303 1f67de7 vmlinux-new.o
> >>
> >> So, first of all, thank you for including some objective measurement of
> >> the impact if your patches. It's much appreciated.
> >>
> >> But I think the patches need to come with a solid theory of why they're
> >> good. The minimum bar for that, I think, is *some* kind of actual
> >> real-world performance test. I'm not picky. Just *something* that spends
> >> a lot of time in the kernel and ideally where a profile points at some
> >> of the code you're poking here.
> >>
> >> I'm seriously not picky: will-it-scale, lmbench, dbench, kernel
> >> compiles. *ANYTHING*. *ANY* hardware. Run it on your laptop.
> >>
> >> But performance patches need to come with performance *numbers*.
> >
> > I don't consider this patch a performance patch, it is more a patch
> > that fixes a correctness issue. The compiler estimates the number of
> > instructions in the asm template wrong, so the patch instructs the
> > compiler that everything in the template in fact results in a single
> > instruction, no matter the pseudos there. The correct estimation then
> > allows the compiler to do its job better (e.g. better scheduling,
> > better inlining decisions, etc...).
>
> Why does it matter if the compiler does its job better?
>
> I'll let the other folks who maintain this code chime in if they think
> I'm off my rocker. But, *I* consider this -- and all of these, frankly
> -- performance patches.
I was looking at some size changes related to a different 'trivial'
code change.
It caused gcc to make apparently unrelated inlining decisions that caused
some functions to grow/shrink by +/-100+ bytes even though the actual
change would mostly only add/remove a single instruction.
I've lost the patch for this one, but if the asm block does expand to a
single instruction it is likely to making gcc decide to inline one of the
functions that uses it - so increasing overall code size.
Whether that helps or hinders performance is difficult to say.
David
Powered by blists - more mailing lists