linux-kernel - [RFC PATCH 31/56] x86/alternative: Prepend nops with retpolines

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251013143444.3999-32-david.kaplan@amd.com>
Date: Mon, 13 Oct 2025 09:34:19 -0500
From: David Kaplan <david.kaplan@....com>
To: Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>,
	Peter Zijlstra <peterz@...radead.org>, Josh Poimboeuf <jpoimboe@...nel.org>,
	Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>, Ingo Molnar
	<mingo@...hat.com>, Dave Hansen <dave.hansen@...ux.intel.com>,
	<x86@...nel.org>, "H . Peter Anvin" <hpa@...or.com>
CC: Alexander Graf <graf@...zon.com>, Boris Ostrovsky
	<boris.ostrovsky@...cle.com>, <linux-kernel@...r.kernel.org>
Subject: [RFC PATCH 31/56] x86/alternative: Prepend nops with retpolines

When patching retpolines, nops may be required for padding such as when
turning a 5-byte direct call into a 2-byte indirect call.  Previously,
these were appended at the end so the code becomes "call *reg;nop;nop;nop"
for example.  This was fine because it's always going from a larger
instruction to a smaller one.

But this is a problem if the sequence is transformed from a 2-byte indirect
to the 5-byte direct call version at runtime because when the called
function returns, it will be in the middle of the 5-byte call instruction.

To fix this, prepend the nops instead of appending them.  Consequently, the
return site of the called function is always the same.

For indirect jmps this is potentially slightly less efficient compared to
appending nops, but indirect jmps are so rare this hardly seems worth
optimizing.

Signed-off-by: David Kaplan <david.kaplan@....com>
---
 arch/x86/kernel/alternative.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 8ee5ff547357..7a1f17078581 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -854,6 +854,21 @@ static bool cpu_wants_indirect_its_thunk_at(unsigned long addr, int reg)
 
 #endif /* CONFIG_MITIGATION_ITS */
 
+static void prepend_nops(u8 *bytes, int curlen, int neededlen)
+{
+	u8 newbytes[16];
+	int pad = neededlen - curlen;
+
+	/* Fill padding bytes with NOP. */
+	memset(newbytes, BYTES_NOP1, pad);
+
+	/* Copy the new instruction in. */
+	memcpy(newbytes + pad, bytes, curlen);
+
+	/* And write the final result back out to bytes. */
+	memcpy(bytes, newbytes, neededlen);
+}
+
 /*
  * Rewrite the compiler generated retpoline thunk calls.
  *
@@ -942,10 +957,16 @@ static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
 		return ret;
 	i += ret;
 
-	for (; i < insn->length;)
-		bytes[i++] = BYTES_NOP1;
+	/*
+	 * Prepend the instruction with NOPs.  These are prepended, instead of
+	 * appended so the return site does not change.  This is necessary when
+	 * re-patching retpolines at runtime, such as via
+	 * CONFIG_DYNAMIC_MITIGATIONS, but do it always since the performance is
+	 * the same either way (other than for JMP, but those are very rare).
+	 */
+	prepend_nops(bytes, i, insn->length);
 
-	return i;
+	return insn->length;
 }
 
 /*
-- 
2.34.1