[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4Y+xWW3Xq+SeRo7n7BS__epyYZC9ygnNU4EmWb+BPzC4w@mail.gmail.com>
Date: Fri, 14 Feb 2025 08:25:55 +0100
From: Uros Bizjak <ubizjak@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: x86@...nel.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>,
Dennis Zhou <dennis@...nel.org>, Tejun Heo <tj@...nel.org>, Christoph Lameter <cl@...ux.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>
Subject: Re: [PATCH RESEND 2/2] x86/locking: Use asm_inline for
{,try_}cmpxchg{64,128} emulations
On Thu, Feb 13, 2025 at 11:52 PM Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 2/13/25 14:13, Uros Bizjak wrote:
> > On Thu, Feb 13, 2025 at 9:48 PM Dave Hansen <dave.hansen@...el.com> wrote:
> >> On 2/13/25 11:14, Uros Bizjak wrote:
> >>> According to [1], the usage of asm pseudo directives in the asm template
> >>> can confuse the compiler to wrongly estimate the size of the generated
> >>> code. ALTERNATIVE macro expands to several asm pseudo directives, so
> >>> its usage in {,try_}cmpxchg{64,128} causes instruction length estimate
> >>> to fail by an order of magnitude (the compiler estimates the length of
> >>> an asm to be more than 20 instructions).
> >>
> >> Just curious, but how did you come up with the "20 instructions" number?
> >
> > Currently, a patched GCC compiler is needed (please see
> > asm_insn_count() and asm_str_count() functions in gcc/final.cc on how
> > the asm length is calculated) to report the length. For historic
> > reasons, the length of asm is not printed in asm dumps, but recently a
> > GCC PR was filled with a request to change this).
>
> So, that's also good info to add. You can even do it in the changelog
> with little more space than the existing changelog:
>
> ... fail by an order of magnitude (a hacked-up gcc shows that it
> estimates the length of an asm to be more than 20 instructions).
>
> ...
> >> Is any of this measurable? Is there any objective data to support that
> >> this change is a good one?
> >
> > Actually, "asm inline" was added to the GCC compiler just for this
> > purpose by request from the linux community [1].
>
> Wow, that's really important important information. Shouldn't the fact
> that this is leveraging a new feature that we asked for specifically get
> called out somewhere?
>
> Who asked for it? Are they on cc? Do they agree that this feature fills
> the gap they wanted filled?
asm_inline is already used in some 40-50 places throughout the tree,
but there still remain some places that could benefit from it.
> > My patch follows the
> > example of other similar macros (e.g. arch/x86/include/alternative.h)
> > and adds the same cure to asms that will undoubtedly result in a
> > single instruction [*]. The benefit is much more precise length
> > estimation, so compiler heuristic is able to correctly estimate the
> > benefit of inlining, not being skewed by excessive use of
> > __always_inline directive. OTOH, it is hard to back up compiler
> > decisions by objective data, as inlining decisions depend on several
> > factors besides function size (e.g. how hot/cold is function), so a
> > simple comparison of kernel sizes does not present the full picture.
>
> Yes, the world is complicated. But, honestly, one data point is a
> billion times better than zero. Right now, we're at zero.
>
> >> It's quite possible that someone did the "asm" on purpose because
> >> over-estimating the size was a good thing.
> >
> > I doubt this would be the case, and I would consider the code that
> > depends on this detail defective. The code that results in one asm
> > instruction should be accounted as such, no matter what internal
> > details are exposed in the instruction asm template.
>
> Yeah, but defective or not, if this causes a regression, it's either not
> getting applied to gets reverted.
>
> All that I'm asking here is that someone look at the kernel after the
> patch gets applied and sanity check it. Absolutely basic scientific
> method stuff. Make a hypothesis about what it will do:
>
> 1. Inline these locking functions
> 2. Make the kernel go faster for _something_
>
> and if it doesn't match the hypothesis, then try and figure out why. You
> don't have to do every config or every compiler. Just do one config and
> one modern compiler.
>
> Right now, this patch is saying:
>
> 1. gcc appears to have done something that might be suboptimal
> 2. gcc has a new feature that might make it less suboptimal
> 3. here's a patch that should optimize things
> ...
>
> but then it leaves us hanging. There's a lot of "mights" and "shoulds"
> in there, but nothing that shows that this actually does anything
> positive in practice.
Let me harvest some data and report the findings in a V2 ChangeLog.
However, these particular macros are rarely used, so I don't expect
some big changes in the generated asm code.
> Maybe I'm just a dummy and this is just an obvious improvement that I
> can't grasp. If so, sorry for being so dense, but I'm going to need a
> little more education before this gets applied.
Thanks,
Uros.
Powered by blists - more mailing lists