[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250402182241.GY5880@noisy.programming.kicks-ass.net>
Date: Wed, 2 Apr 2025 20:22:41 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Andrew Cooper <andrew.cooper3@...rix.com>
Cc: mjguzik@...il.com, linux-kernel@...r.kernel.org, mingo@...hat.com,
torvalds@...ux-foundation.org, x86@...nel.org
Subject: Re: [RFC PATCH] x86: prevent gcc from emitting rep movsq/stosq for
inlined ops
On Wed, Apr 02, 2025 at 07:17:03PM +0100, Andrew Cooper wrote:
> > Please make this a gcc bug-report instead - I really don't want to
> > have random compiler-specific tuning options in the kernel. Because
> > that whole memcpy-strategy thing is something that gets tuned by a lot
> > of other compiler options (ie -march and different versions).
>
> I've discussed this with PeterZ in the past, although I can't for the
> life of me find the bugzilla ticket I thought I opened on the matter.
> (Maybe I never got that far).
>
> The behaviour wanted is:
>
> 1) Convert to plain plain accesses (so they can be merged/combined/etc), or
> 2) Emit a library call
>
> because we do provide forms that are better than the GCC-chosen "REP
> MOVSQ with manual alignment" in the general case.
>
> Taking a leaf out of the repoline book, the ideal library call(s) would be:
>
> CALL __x86_thunk_rep_{mov,stos}sb
>
> using the REP ABI (parameters in %rcx/%rdi/etc), rather than the SYSV ABI.
>
> For current/future processors, which have fast reps of all short/zero
> flavours, we can even inline the REP {MOV,STO}S instruction to avoid the
> call.
>
> For older microarchitectures, they can reuse the existing memcpy/memset
> implementations, just with marginally less parameter shuffling.
>
> How does this sound?
Right, vague memories indeed. We do something like this manually for
copy_user_generic().
But it would indeed be very nice if the compiler were to emit such thunk
calls instead of doing rep whatever and then we can objtool collect the
locations and patch at runtime to be 'rep movs' or not, depending on
CPU flags etc.
Powered by blists - more mailing lists