[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251016103233.GC3289052@noisy.programming.kicks-ass.net>
Date: Thu, 16 Oct 2025 12:32:33 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: David Kaplan <david.kaplan@....com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Ingo Molnar <mingo@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H . Peter Anvin" <hpa@...or.com>, Alexander Graf <graf@...zon.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 31/56] x86/alternative: Prepend nops with retpolines
On Mon, Oct 13, 2025 at 09:34:19AM -0500, David Kaplan wrote:
> When patching retpolines, nops may be required for padding such as when
> turning a 5-byte direct call into a 2-byte indirect call. Previously,
> these were appended at the end so the code becomes "call *reg;nop;nop;nop"
> for example. This was fine because it's always going from a larger
> instruction to a smaller one.
>
> But this is a problem if the sequence is transformed from a 2-byte indirect
> to the 5-byte direct call version at runtime because when the called
> function returns, it will be in the middle of the 5-byte call instruction.
>
> To fix this, prepend the nops instead of appending them. Consequently, the
> return site of the called function is always the same.
>
> For indirect jmps this is potentially slightly less efficient compared to
> appending nops, but indirect jmps are so rare this hardly seems worth
> optimizing.
Durr..
So Retpoline sites can be 5 or 6 bytes, depending on the register.
Also, I suppose at this point I would prefer prefix stuffing
over multiple instructions. The prefix stuffing ensures it stays a
single instruction.
Alas, some micro-archs have significant decode penalties for >3
prefixes, and filling 6 bytes will need 4 prefixes :-(
Powered by blists - more mailing lists