[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250520102036.5d61f565@fangorn>
Date: Tue, 20 May 2025 10:20:36 -0400
From: Rik van Riel <riel@...riel.com>
To: linux-kernel@...r.kernel.org
Cc: x86@...nel.org, kernel-team@...a.com, dave.hansen@...ux.intel.com,
luto@...nel.org, mingo@...hat.com, bp@...en8.de, peterz@...radead.org,
nadav.amit@...il.com
Subject: [PATCH] x86/mm: fix race between flush_tlb_func and idle task
leave_mm
There is a tiny race window between flush_tlb_func() and the call to
leave_mm() from cpuidle_enter_state() in the idle task.
The race happens when a CPU goes idle, through enter_lazy_tlb(),
while the process on the CPU is transitioning to a global ASID.
If the TLB flush IPI arrives between the call to enter_lazy_tlb(),
and the CPU actually going idle, the mm_needs_global_asid()
branch in flush_tlb_func() will switch the CPU to the global
ASID, and return with the CPU no longer in lazy TLB mode.
If the system then selects a deeper idle state, the warning in
leave_mm() will trigger.
This race has not been observed with only the INVLPGB code running
on several thousand hosts over several weeks, but it's showing up
several times a minute in my tests with the RAR code.
Avoid the race by moving the .is_lazy test to before the global ASID
test in flush_tlb_func().
Signed-off-by: Rik van Riel <riel@...riel.com>
Fixes: 4afeb0ed1753 x86/mm: Enable broadcast TLB invalidation for multi-threaded processes
Cc: stable@...nel.org
---
arch/x86/mm/tlb.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3feb6ae2b678..9010bcfdfc20 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1150,6 +1150,20 @@ static void flush_tlb_func(void *info)
if (unlikely(loaded_mm == &init_mm))
return;
+ if (this_cpu_read(cpu_tlbstate_shared.is_lazy)) {
+ /*
+ * We're in lazy mode. We need to at least flush our
+ * paging-structure cache to avoid speculatively reading
+ * garbage into our TLB. Since switching to init_mm is barely
+ * slower than a minimal flush, just switch to init_mm.
+ *
+ * This should be rare, with native_flush_tlb_multi() skipping
+ * IPIs to lazy TLB mode CPUs.
+ */
+ switch_mm_irqs_off(NULL, &init_mm, NULL);
+ return;
+ }
+
/* Reload the ASID if transitioning into or out of a global ASID */
if (mm_needs_global_asid(loaded_mm, loaded_mm_asid)) {
switch_mm_irqs_off(NULL, loaded_mm, NULL);
@@ -1168,20 +1182,6 @@ static void flush_tlb_func(void *info)
VM_WARN_ON(is_dyn_asid(loaded_mm_asid) && loaded_mm->context.ctx_id !=
this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id));
- if (this_cpu_read(cpu_tlbstate_shared.is_lazy)) {
- /*
- * We're in lazy mode. We need to at least flush our
- * paging-structure cache to avoid speculatively reading
- * garbage into our TLB. Since switching to init_mm is barely
- * slower than a minimal flush, just switch to init_mm.
- *
- * This should be rare, with native_flush_tlb_multi() skipping
- * IPIs to lazy TLB mode CPUs.
- */
- switch_mm_irqs_off(NULL, &init_mm, NULL);
- return;
- }
-
if (is_dyn_asid(loaded_mm_asid))
local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
--
2.47.1
Powered by blists - more mailing lists