[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4bpHGE83qc37sbh=rpGj+SFqQrsNDLzL_-NQpo6pQH3jw@mail.gmail.com>
Date: Fri, 28 Feb 2025 23:31:08 +0100
From: Uros Bizjak <ubizjak@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic
locking insns
On Fri, Feb 28, 2025 at 5:48 PM Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 2/28/25 04:35, Uros Bizjak wrote:
> > The code size of the resulting x86_64 defconfig object file increases
> > for 33.264 kbytes, representing 1.2% code size increase:
> >
> > text data bss dec hex filename
> > 27450107 4633332 814148 32897587 1f5fa33 vmlinux-old.o
> > 27483371 4633784 814148 32931303 1f67de7 vmlinux-new.o
>
> So, first of all, thank you for including some objective measurement of
> the impact if your patches. It's much appreciated.
>
> But I think the patches need to come with a solid theory of why they're
> good. The minimum bar for that, I think, is *some* kind of actual
> real-world performance test. I'm not picky. Just *something* that spends
> a lot of time in the kernel and ideally where a profile points at some
> of the code you're poking here.
>
> I'm seriously not picky: will-it-scale, lmbench, dbench, kernel
> compiles. *ANYTHING*. *ANY* hardware. Run it on your laptop.
>
> But performance patches need to come with performance *numbers*.
I don't consider this patch a performance patch, it is more a patch
that fixes a correctness issue. The compiler estimates the number of
instructions in the asm template wrong, so the patch instructs the
compiler that everything in the template in fact results in a single
instruction, no matter the pseudos there. The correct estimation then
allows the compiler to do its job better (e.g. better scheduling,
better inlining decisions, etc...).
The metric of code size is excellent for -Os compile, but not so good
for -O2 compile, and measured results mirror that.. In the -O2 case,
we actually requested from the compiler to prioritize the performance,
not code size, so the code size measurements are only of limited
relevance. The purpose of these measurements are to show that the
effect of the patch is limited to the expected 1% of code size
difference.
I don't expect some noticeable performance changes from the
non-algorithmic patch like this. TBH, I would be surprised if they
were outside the measurement noise. Nevertheless, I'll try to provide
some performance numbers.
Thanks,
Uros.
Powered by blists - more mailing lists