lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251016103233.GC3289052@noisy.programming.kicks-ass.net>
Date: Thu, 16 Oct 2025 12:32:33 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: David Kaplan <david.kaplan@....com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>,
	Josh Poimboeuf <jpoimboe@...nel.org>,
	Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
	Ingo Molnar <mingo@...hat.com>,
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
	"H . Peter Anvin" <hpa@...or.com>, Alexander Graf <graf@...zon.com>,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 31/56] x86/alternative: Prepend nops with retpolines

On Mon, Oct 13, 2025 at 09:34:19AM -0500, David Kaplan wrote:
> When patching retpolines, nops may be required for padding such as when
> turning a 5-byte direct call into a 2-byte indirect call.  Previously,
> these were appended at the end so the code becomes "call *reg;nop;nop;nop"
> for example.  This was fine because it's always going from a larger
> instruction to a smaller one.
> 
> But this is a problem if the sequence is transformed from a 2-byte indirect
> to the 5-byte direct call version at runtime because when the called
> function returns, it will be in the middle of the 5-byte call instruction.
> 
> To fix this, prepend the nops instead of appending them.  Consequently, the
> return site of the called function is always the same.
> 
> For indirect jmps this is potentially slightly less efficient compared to
> appending nops, but indirect jmps are so rare this hardly seems worth
> optimizing.

Durr.. 

So Retpoline sites can be 5 or 6 bytes, depending on the register. 

Also, I suppose at this point I would prefer prefix stuffing
over multiple instructions. The prefix stuffing ensures it stays a
single instruction.

Alas, some micro-archs have significant decode penalties for >3
prefixes, and filling 6 bytes will need 4 prefixes :-(

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ