[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251114151428.1064524-9-vschneid@redhat.com>
Date: Fri, 14 Nov 2025 16:14:26 +0100
From: Valentin Schneider <vschneid@...hat.com>
To: linux-kernel@...r.kernel.org,
linux-mm@...ck.org,
rcu@...r.kernel.org,
x86@...nel.org,
linux-arm-kernel@...ts.infradead.org,
loongarch@...ts.linux.dev,
linux-riscv@...ts.infradead.org,
linux-arch@...r.kernel.org,
linux-trace-kernel@...r.kernel.org
Cc: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>,
Andy Lutomirski <luto@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Arnd Bergmann <arnd@...db.de>,
Frederic Weisbecker <frederic@...nel.org>,
"Paul E. McKenney" <paulmck@...nel.org>,
Jason Baron <jbaron@...mai.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ard Biesheuvel <ardb@...nel.org>,
Sami Tolvanen <samitolvanen@...gle.com>,
"David S. Miller" <davem@...emloft.net>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
Joel Fernandes <joelagnelf@...dia.com>,
Josh Triplett <josh@...htriplett.org>,
Boqun Feng <boqun.feng@...il.com>,
Uladzislau Rezki <urezki@...il.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Mel Gorman <mgorman@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Masahiro Yamada <masahiroy@...nel.org>,
Han Shen <shenhan@...gle.com>,
Rik van Riel <riel@...riel.com>,
Jann Horn <jannh@...gle.com>,
Dan Carpenter <dan.carpenter@...aro.org>,
Oleg Nesterov <oleg@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Clark Williams <williams@...hat.com>,
Yair Podemsky <ypodemsk@...hat.com>,
Marcelo Tosatti <mtosatti@...hat.com>,
Daniel Wagner <dwagner@...e.de>,
Petr Tesarik <ptesarik@...e.com>,
Shrikanth Hegde <sshegde@...ux.ibm.com>
Subject: [RFC PATCH v7 29/31] x86/mm/pti: Implement a TLB flush immediately after a switch to kernel CR3
Deferring kernel range TLB flushes requires the guarantee that upon
entering the kernel, no stale entry may be accessed. The simplest way to
provide such a guarantee is to issue an unconditional flush upon switching
to the kernel CR3, as this is the pivoting point where such stale entries
may be accessed.
As this is only relevant to NOHZ_FULL, restrict the mechanism to NOHZ_FULL
CPUs.
Note that the COALESCE_TLBI config option is introduced in a later commit,
when the whole feature is implemented.
Signed-off-by: Valentin Schneider <vschneid@...hat.com>
---
arch/x86/entry/calling.h | 25 ++++++++++++++++++++++---
arch/x86/kernel/asm-offsets.c | 1 +
2 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 0187c0ea2fddb..620203ef04e9f 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -10,6 +10,7 @@
#include <asm/msr.h>
#include <asm/nospec-branch.h>
#include <asm/jump_label.h>
+#include <asm/invpcid.h>
/*
@@ -171,9 +172,27 @@ For 32-bit we have the following conventions - kernel is built with
andq $(~PTI_USER_PGTABLE_AND_PCID_MASK), \reg
.endm
-.macro COALESCE_TLBI
+.macro COALESCE_TLBI scratch_reg:req
#ifdef CONFIG_COALESCE_TLBI
STATIC_BRANCH_FALSE_LIKELY housekeeping_overridden, .Lend_\@
+ /* No point in doing this for housekeeping CPUs */
+ movslq PER_CPU_VAR(cpu_number), \scratch_reg
+ bt \scratch_reg, tick_nohz_full_mask(%rip)
+ jnc .Lend_tlbi_\@
+
+ ALTERNATIVE "jmp .Lcr4_\@", "", X86_FEATURE_INVPCID
+ movq $(INVPCID_TYPE_ALL_INCL_GLOBAL), \scratch_reg
+ /* descriptor is all zeroes, point at the zero page */
+ invpcid empty_zero_page(%rip), \scratch_reg
+ jmp .Lend_tlbi_\@
+.Lcr4_\@:
+ /* Note: this gives CR4 pinning the finger */
+ movq PER_CPU_VAR(cpu_tlbstate + TLB_STATE_cr4), \scratch_reg
+ xorq $(X86_CR4_PGE), \scratch_reg
+ movq \scratch_reg, %cr4
+ xorq $(X86_CR4_PGE), \scratch_reg
+ movq \scratch_reg, %cr4
+.Lend_tlbi_\@:
movl $1, PER_CPU_VAR(kernel_cr3_loaded)
.Lend_\@:
#endif // CONFIG_COALESCE_TLBI
@@ -192,7 +211,7 @@ For 32-bit we have the following conventions - kernel is built with
mov %cr3, \scratch_reg
ADJUST_KERNEL_CR3 \scratch_reg
mov \scratch_reg, %cr3
- COALESCE_TLBI
+ COALESCE_TLBI \scratch_reg
.Lend_\@:
.endm
@@ -260,7 +279,7 @@ For 32-bit we have the following conventions - kernel is built with
ADJUST_KERNEL_CR3 \scratch_reg
movq \scratch_reg, %cr3
- COALESCE_TLBI
+ COALESCE_TLBI \scratch_reg
.Ldone_\@:
.endm
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 32ba599a51f88..deb92e9c8923d 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -106,6 +106,7 @@ static void __used common(void)
/* TLB state for the entry code */
OFFSET(TLB_STATE_user_pcid_flush_mask, tlb_state, user_pcid_flush_mask);
+ OFFSET(TLB_STATE_cr4, tlb_state, cr4);
/* Layout info for cpu_entry_area */
OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
--
2.51.0
Powered by blists - more mailing lists