linux-kernel - Re: [RFC PATCH 31/56] x86/alternative: Prepend nops with retpolines

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251016103233.GC3289052@noisy.programming.kicks-ass.net>
Date: Thu, 16 Oct 2025 12:32:33 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: David Kaplan <david.kaplan@....com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>,
	Josh Poimboeuf <jpoimboe@...nel.org>,
	Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
	Ingo Molnar <mingo@...hat.com>,
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
	"H . Peter Anvin" <hpa@...or.com>, Alexander Graf <graf@...zon.com>,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 31/56] x86/alternative: Prepend nops with retpolines

On Mon, Oct 13, 2025 at 09:34:19AM -0500, David Kaplan wrote:
> When patching retpolines, nops may be required for padding such as when
> turning a 5-byte direct call into a 2-byte indirect call.  Previously,
> these were appended at the end so the code becomes "call *reg;nop;nop;nop"
> for example.  This was fine because it's always going from a larger
> instruction to a smaller one.
> 
> But this is a problem if the sequence is transformed from a 2-byte indirect
> to the 5-byte direct call version at runtime because when the called
> function returns, it will be in the middle of the 5-byte call instruction.
> 
> To fix this, prepend the nops instead of appending them.  Consequently, the
> return site of the called function is always the same.
> 
> For indirect jmps this is potentially slightly less efficient compared to
> appending nops, but indirect jmps are so rare this hardly seems worth
> optimizing.

Durr.. 

So Retpoline sites can be 5 or 6 bytes, depending on the register. 

Also, I suppose at this point I would prefer prefix stuffing
over multiple instructions. The prefix stuffing ensures it stays a
single instruction.

Alas, some micro-archs have significant decode penalties for >3
prefixes, and filling 6 bytes will need 4 prefixes :-(