linux-kernel - [PATCH RFC v2 26/29] x86: Create library for flushing L1D for L1TF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250110-asi-rfc-v2-v2-26-8419288bc805@google.com>
Date: Fri, 10 Jan 2025 18:40:52 +0000
From: Brendan Jackman <jackmanb@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, 
	Andy Lutomirski <luto@...nel.org>, Peter Zijlstra <peterz@...radead.org>, 
	Richard Henderson <richard.henderson@...aro.org>, Matt Turner <mattst88@...il.com>, 
	Vineet Gupta <vgupta@...nel.org>, Russell King <linux@...linux.org.uk>, 
	Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>, Guo Ren <guoren@...nel.org>, 
	Brian Cain <bcain@...cinc.com>, Huacai Chen <chenhuacai@...nel.org>, 
	WANG Xuerui <kernel@...0n.name>, Geert Uytterhoeven <geert@...ux-m68k.org>, 
	Michal Simek <monstr@...str.eu>, Thomas Bogendoerfer <tsbogend@...ha.franken.de>, 
	Dinh Nguyen <dinguyen@...nel.org>, Jonas Bonn <jonas@...thpole.se>, 
	Stefan Kristiansson <stefan.kristiansson@...nalahti.fi>, Stafford Horne <shorne@...il.com>, 
	"James E.J. Bottomley" <James.Bottomley@...senPartnership.com>, Helge Deller <deller@....de>, 
	Michael Ellerman <mpe@...erman.id.au>, Nicholas Piggin <npiggin@...il.com>, 
	Christophe Leroy <christophe.leroy@...roup.eu>, Naveen N Rao <naveen@...nel.org>, 
	Madhavan Srinivasan <maddy@...ux.ibm.com>, Paul Walmsley <paul.walmsley@...ive.com>, 
	Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>, 
	Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>, 
	Alexander Gordeev <agordeev@...ux.ibm.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>, 
	Sven Schnelle <svens@...ux.ibm.com>, Yoshinori Sato <ysato@...rs.sourceforge.jp>, 
	Rich Felker <dalias@...c.org>, John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>, 
	"David S. Miller" <davem@...emloft.net>, Andreas Larsson <andreas@...sler.com>, 
	Richard Weinberger <richard@....at>, Anton Ivanov <anton.ivanov@...bridgegreys.com>, 
	Johannes Berg <johannes@...solutions.net>, Chris Zankel <chris@...kel.net>, 
	Max Filippov <jcmvbkbc@...il.com>, Arnd Bergmann <arnd@...db.de>, 
	Andrew Morton <akpm@...ux-foundation.org>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, Uladzislau Rezki <urezki@...il.com>, 
	Christoph Hellwig <hch@...radead.org>, Masami Hiramatsu <mhiramat@...nel.org>, 
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Mike Rapoport <rppt@...nel.org>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Ian Rogers <irogers@...gle.com>, Adrian Hunter <adrian.hunter@...el.com>, 
	Dennis Zhou <dennis@...nel.org>, Tejun Heo <tj@...nel.org>, Christoph Lameter <cl@...ux.com>, 
	Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, 
	Ard Biesheuvel <ardb@...nel.org>, Josh Poimboeuf <jpoimboe@...nel.org>, 
	Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, linux-alpha@...r.kernel.org, 
	linux-snps-arc@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org, 
	linux-csky@...r.kernel.org, linux-hexagon@...r.kernel.org, 
	loongarch@...ts.linux.dev, linux-m68k@...ts.linux-m68k.org, 
	linux-mips@...r.kernel.org, linux-openrisc@...r.kernel.org, 
	linux-parisc@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org, 
	linux-riscv@...ts.infradead.org, linux-s390@...r.kernel.org, 
	linux-sh@...r.kernel.org, sparclinux@...r.kernel.org, 
	linux-um@...ts.infradead.org, linux-arch@...r.kernel.org, linux-mm@...ck.org, 
	linux-trace-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org, 
	kvm@...r.kernel.org, linux-efi@...r.kernel.org, 
	Brendan Jackman <jackmanb@...gle.com>
Subject: [PATCH RFC v2 26/29] x86: Create library for flushing L1D for L1TF

ASI will need to use this L1D flushing logic so put it in a library
where it can be used independently of KVM.

Since we're creating this library, it starts to look messy if we don't
also use it in the double-opt-in (both kernel cmdline and prctl)
mm-switching flush logic which is there for mitigating Snoop-Assisted L1
Data Sampling ("SAL1DS"). However, that logic doesn't use any
software-based fallback for flushing on CPUs without the L1D_FLUSH
command. In that case the prctl opt-in will fail.

One option would be to just start using the software fallback sequence
currently done by VMX code, but Linus didn't seem happy with a similar
sequence being used here [1]. CPUs affected by SAL1DS are a subset of
those affected by L1TF, so it wouldn't be completely insane to assume
that the same sequence works for both cases, but I'll err on the side of
caution and avoid risk of giving users a false impression that the
kernel has really flushed L1D for them.

[1] https://lore.kernel.org/linux-kernel/CAHk-=whC4PUhErcoDhCbTOdmPPy-Pj8j9ytsdcyz9TorOb4KUw@mail.gmail.com/

Instead, create this awkward library that is scoped specifically to L1TF,
which will be used only by VMX and ASI, and has an annoying "only
sometimes works" doc-comment. Users of the library can then infer from
that comment whether they have flushed L1D.

No functional change intended.

Checkpatch-args: --ignore=COMMIT_LOG_LONG_LINE
Signed-off-by: Brendan Jackman <jackmanb@...gle.com>
---
 arch/x86/Kconfig            |  4 ++
 arch/x86/include/asm/l1tf.h | 11 ++++++
 arch/x86/kvm/Kconfig        |  1 +
 arch/x86/kvm/vmx/vmx.c      | 66 +++----------------------------
 arch/x86/lib/Makefile       |  1 +
 arch/x86/lib/l1tf.c         | 94 +++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 117 insertions(+), 60 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ae31f36ce23d7c29d1e90b726c5a2e6ea5a63c8d..ca984dc7ee2f2b68c3ce1bcb5055047ca4f2a65d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2523,6 +2523,7 @@ config MITIGATION_ADDRESS_SPACE_ISOLATION
 	bool "Allow code to run with a reduced kernel address space"
 	default n
 	depends on X86_64 && !PARAVIRT && !UML
+	select X86_L1TF_FLUSH_LIB
 	help
 	  This feature provides the ability to run some kernel code
 	  with a reduced kernel address space. This can be used to
@@ -3201,6 +3202,9 @@ config HAVE_ATOMIC_IOMAP
 	def_bool y
 	depends on X86_32
 
+config X86_L1TF_FLUSH_LIB
+	def_bool n
+
 source "arch/x86/kvm/Kconfig"
 
 source "arch/x86/Kconfig.assembler"
diff --git a/arch/x86/include/asm/l1tf.h b/arch/x86/include/asm/l1tf.h
new file mode 100644
index 0000000000000000000000000000000000000000..e0be19c588bb5ec5c76a1861492e48b88615b4b8
--- /dev/null
+++ b/arch/x86/include/asm/l1tf.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_L1TF_FLUSH_H
+#define _ASM_L1TF_FLUSH_H
+
+#ifdef CONFIG_X86_L1TF_FLUSH_LIB
+int l1tf_flush_setup(void);
+void l1tf_flush(void);
+#endif /* CONFIG_X86_L1TF_FLUSH_LIB */
+
+#endif
+
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index f09f13c01c6bbd28fa37fdf50547abf4403658c9..81c71510e33e52447882ab7b22682199c57b492e 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -92,6 +92,7 @@ config KVM_SW_PROTECTED_VM
 config KVM_INTEL
 	tristate "KVM for Intel (and compatible) processors support"
 	depends on KVM && IA32_FEAT_CTL
+	select X86_L1TF_FLUSH_LIB
 	help
 	  Provides support for KVM on processors equipped with Intel's VT
 	  extensions, a.k.a. Virtual Machine Extensions (VMX).
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0e90463f1f2183b8d716f85d5c8a8af8958fef0b..b1a02f27b3abce0ef6ac448b66bef2c653a52eef 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -42,6 +42,7 @@
 #include <asm/idtentry.h>
 #include <asm/io.h>
 #include <asm/irq_remapping.h>
+#include <asm/l1tf.h>
 #include <asm/reboot.h>
 #include <asm/perf_event.h>
 #include <asm/mmu_context.h>
@@ -250,9 +251,6 @@ static void *vmx_l1d_flush_pages;
 
 static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
 {
-	struct page *page;
-	unsigned int i;
-
 	if (!boot_cpu_has_bug(X86_BUG_L1TF)) {
 		l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED;
 		return 0;
@@ -288,26 +286,11 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
 		l1tf = VMENTER_L1D_FLUSH_ALWAYS;
 	}
 
-	if (l1tf != VMENTER_L1D_FLUSH_NEVER && !vmx_l1d_flush_pages &&
-	    !boot_cpu_has(X86_FEATURE_FLUSH_L1D)) {
-		/*
-		 * This allocation for vmx_l1d_flush_pages is not tied to a VM
-		 * lifetime and so should not be charged to a memcg.
-		 */
-		page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER);
-		if (!page)
-			return -ENOMEM;
-		vmx_l1d_flush_pages = page_address(page);
+	if (l1tf != VMENTER_L1D_FLUSH_NEVER) {
+		int err = l1tf_flush_setup();
 
-		/*
-		 * Initialize each page with a different pattern in
-		 * order to protect against KSM in the nested
-		 * virtualization case.
-		 */
-		for (i = 0; i < 1u << L1D_CACHE_ORDER; ++i) {
-			memset(vmx_l1d_flush_pages + i * PAGE_SIZE, i + 1,
-			       PAGE_SIZE);
-		}
+		if (err)
+			return err;
 	}
 
 	l1tf_vmx_mitigation = l1tf;
@@ -6652,20 +6635,8 @@ int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	return ret;
 }
 
-/*
- * Software based L1D cache flush which is used when microcode providing
- * the cache control MSR is not loaded.
- *
- * The L1D cache is 32 KiB on Nehalem and later microarchitectures, but to
- * flush it is required to read in 64 KiB because the replacement algorithm
- * is not exactly LRU. This could be sized at runtime via topology
- * information but as all relevant affected CPUs have 32KiB L1D cache size
- * there is no point in doing so.
- */
 static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
 {
-	int size = PAGE_SIZE << L1D_CACHE_ORDER;
-
 	/*
 	 * This code is only executed when the flush mode is 'cond' or
 	 * 'always'
@@ -6695,32 +6666,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
 
 	vcpu->stat.l1d_flush++;
 
-	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
-		native_wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
-		return;
-	}
-
-	asm volatile(
-		/* First ensure the pages are in the TLB */
-		"xorl	%%eax, %%eax\n"
-		".Lpopulate_tlb:\n\t"
-		"movzbl	(%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
-		"addl	$4096, %%eax\n\t"
-		"cmpl	%%eax, %[size]\n\t"
-		"jne	.Lpopulate_tlb\n\t"
-		"xorl	%%eax, %%eax\n\t"
-		"cpuid\n\t"
-		/* Now fill the cache */
-		"xorl	%%eax, %%eax\n"
-		".Lfill_cache:\n"
-		"movzbl	(%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
-		"addl	$64, %%eax\n\t"
-		"cmpl	%%eax, %[size]\n\t"
-		"jne	.Lfill_cache\n\t"
-		"lfence\n"
-		:: [flush_pages] "r" (vmx_l1d_flush_pages),
-		    [size] "r" (size)
-		: "eax", "ebx", "ecx", "edx");
+	l1tf_flush();
 }
 
 void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 98583a9dbab337e09a2e58905e5200499a496a07..b0a45bd70b40743a3fccb352b9641caacac83275 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -37,6 +37,7 @@ lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 lib-$(CONFIG_FUNCTION_ERROR_INJECTION)	+= error-inject.o
 lib-$(CONFIG_MITIGATION_RETPOLINE) += retpoline.o
+lib-$(CONFIG_X86_L1TF_FLUSH_LIB) += l1tf.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
 obj-y += iomem.o
diff --git a/arch/x86/lib/l1tf.c b/arch/x86/lib/l1tf.c
new file mode 100644
index 0000000000000000000000000000000000000000..c474f18ae331c8dfa7a029c457dd3cf75bebf808
--- /dev/null
+++ b/arch/x86/lib/l1tf.c
@@ -0,0 +1,94 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+
+#include <asm/cpufeature.h>
+#include <asm/l1tf.h>
+#include <asm/msr.h>
+
+#define L1D_CACHE_ORDER 4
+static void *l1tf_flush_pages;
+
+int l1tf_flush_setup(void)
+{
+	struct page *page;
+	unsigned int i;
+
+	if (l1tf_flush_pages || boot_cpu_has(X86_FEATURE_FLUSH_L1D))
+		return 0;
+
+	page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER);
+	if (!page)
+		return -ENOMEM;
+	l1tf_flush_pages = page_address(page);
+
+	/*
+	 * Initialize each page with a different pattern in
+	 * order to protect against KSM in the nested
+	 * virtualization case.
+	 */
+	for (i = 0; i < 1u << L1D_CACHE_ORDER; ++i) {
+		memset(l1tf_flush_pages + i * PAGE_SIZE, i + 1,
+			 PAGE_SIZE);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(l1tf_flush_setup);
+
+/*
+ * Flush L1D in a way that:
+ *
+ *  - definitely works on CPUs X86_FEATURE_FLUSH_L1D (because the SDM says so).
+ *  - almost definitely works on other CPUs with L1TF (because someone on LKML
+ *    said someone from Intel said so).
+ *  - may or may not work on other CPUs.
+ *
+ * Don't call unless l1tf_flush_setup() has returned successfully.
+ */
+noinstr void l1tf_flush(void)
+{
+	int size = PAGE_SIZE << L1D_CACHE_ORDER;
+
+	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
+		native_wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
+		return;
+	}
+
+	if (WARN_ON(!l1tf_flush_pages))
+		return;
+
+	/*
+	 * This sequence was provided by Intel for the purpose of mitigating
+	 * L1TF on VMX.
+	 *
+	 * The L1D cache is 32 KiB on Nehalem and some later microarchitectures,
+	 * but to flush it is required to read in 64 KiB because the replacement
+	 * algorithm is not exactly LRU. This could be sized at runtime via
+	 * topology information but as all relevant affected CPUs have 32KiB L1D
+	 * cache size there is no point in doing so.
+	 */
+	asm volatile(
+		/* First ensure the pages are in the TLB */
+		"xorl	%%eax, %%eax\n"
+		".Lpopulate_tlb:\n\t"
+		"movzbl	(%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
+		"addl	$4096, %%eax\n\t"
+		"cmpl	%%eax, %[size]\n\t"
+		"jne	.Lpopulate_tlb\n\t"
+		"xorl	%%eax, %%eax\n\t"
+		"cpuid\n\t"
+		/* Now fill the cache */
+		"xorl	%%eax, %%eax\n"
+		".Lfill_cache:\n"
+		"movzbl	(%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
+		"addl	$64, %%eax\n\t"
+		"cmpl	%%eax, %[size]\n\t"
+		"jne	.Lfill_cache\n\t"
+		"lfence\n"
+		:: [flush_pages] "r" (l1tf_flush_pages),
+		    [size] "r" (size)
+		: "eax", "ebx", "ecx", "edx");
+}
+EXPORT_SYMBOL(l1tf_flush);

-- 
2.47.1.613.gc27f4b7a9f-goog