[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e0021746-d43c-4c45-83b6-bcf3982b2548@citrix.com>
Date: Wed, 2 Apr 2025 19:40:01 +0100
From: Andrew Cooper <andrew.cooper3@...rix.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: mjguzik@...il.com, linux-kernel@...r.kernel.org, mingo@...hat.com,
x86@...nel.org, "Peter Zijlstra (Intel)" <peterz@...radead.org>
Subject: Re: [RFC PATCH] x86: prevent gcc from emitting rep movsq/stosq for
inlined ops
On 02/04/2025 7:29 pm, Linus Torvalds wrote:
> On Wed, 2 Apr 2025 at 11:17, Andrew Cooper <andrew.cooper3@...rix.com> wrote:
>> Taking a leaf out of the repoline book, the ideal library call(s) would be:
>>
>> CALL __x86_thunk_rep_{mov,stos}sb
>>
>> using the REP ABI (parameters in %rcx/%rdi/etc), rather than the SYSV ABI.
> Yes. That's basically what 'rep_movs_alternative' does so that we can
> basically do a
>
> ALTERNATIVE("rep movsb",
> "call rep_movs_alternative",
> ALT_NOT(X86_FEATURE_FSRM))
>
> but we only do this for user space copies exactly because we don't
> have a good way to do it for compiler-generated ones.
>
> If gcc just did an out-of-line call, but used the 'rep movs' "calling
> convention", we would be able to basically do the rewriting
> dynamically, replacing the call with an inlined "rep movsb" where
> appropriate.
You still want the compiler to be able to do a first-pass optimisation
over __builtin_mem*(), for elimination/merging/etc, but if it could stop
half way through what it currently does and just emit the library call,
that would be excellent.
~Andrew
Powered by blists - more mailing lists