[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4YVOEtT+bsp9H7ijaoJn2e2108tWhiFarRv=QxoUMZaiw@mail.gmail.com>
Date: Sat, 1 Mar 2025 10:05:56 +0100
From: Uros Bizjak <ubizjak@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic
locking insns
On Fri, Feb 28, 2025 at 11:58 PM Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 2/28/25 14:31, Uros Bizjak wrote:
> > On Fri, Feb 28, 2025 at 5:48 PM Dave Hansen <dave.hansen@...el.com> wrote:
> >>
> >> On 2/28/25 04:35, Uros Bizjak wrote:
> >>> The code size of the resulting x86_64 defconfig object file increases
> >>> for 33.264 kbytes, representing 1.2% code size increase:
> >>>
> >>> text data bss dec hex filename
> >>> 27450107 4633332 814148 32897587 1f5fa33 vmlinux-old.o
> >>> 27483371 4633784 814148 32931303 1f67de7 vmlinux-new.o
> >>
> >> So, first of all, thank you for including some objective measurement of
> >> the impact if your patches. It's much appreciated.
> >>
> >> But I think the patches need to come with a solid theory of why they're
> >> good. The minimum bar for that, I think, is *some* kind of actual
> >> real-world performance test. I'm not picky. Just *something* that spends
> >> a lot of time in the kernel and ideally where a profile points at some
> >> of the code you're poking here.
> >>
> >> I'm seriously not picky: will-it-scale, lmbench, dbench, kernel
> >> compiles. *ANYTHING*. *ANY* hardware. Run it on your laptop.
> >>
> >> But performance patches need to come with performance *numbers*.
> >
> > I don't consider this patch a performance patch, it is more a patch
> > that fixes a correctness issue. The compiler estimates the number of
> > instructions in the asm template wrong, so the patch instructs the
> > compiler that everything in the template in fact results in a single
> > instruction, no matter the pseudos there. The correct estimation then
> > allows the compiler to do its job better (e.g. better scheduling,
> > better inlining decisions, etc...).
>
> Why does it matter if the compiler does its job better?
Please read the long thread [1], especially part [1.1], that was the
reason for gcc to implement asm inline [2].
[1] https://lore.kernel.org/lkml/20181003213100.189959-1-namit@vmware.com/
[1.1] https://lore.kernel.org/lkml/20181007091805.GA30687@zn.tnic/
[2] https://gcc.gnu.org/pipermail/gcc-patches/2018-December/512349.html
Accurate inline decisions are just one of compiler optimizations that
depend on code growth factor, tail duplication [3] is another one,
there are also code hoisting, function cloning, block reordering,
basic block copying, to name a few from the top of my head.
[3] https://gcc.gnu.org/projects/sched-treegion.html
These all work better with accurate input data. These optimizations
are also the reason for 1% code growth with -O2: additional code
blocks now fall under the code size threshold that enables the
mentioned optimizations, under the assumption of -O2 code
size/performance tradeoffs. OTOH, -Os, where different code
size/performance heuristics are used, now performs better w.r.t code
size.
Uros.
Powered by blists - more mailing lists