linux-kernel - [PATCH] x86/mm: fix race between flush_tlb_func and idle task leave

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20250520102036.5d61f565@fangorn>
Date: Tue, 20 May 2025 10:20:36 -0400
From: Rik van Riel <riel@...riel.com>
To: linux-kernel@...r.kernel.org
Cc: x86@...nel.org, kernel-team@...a.com, dave.hansen@...ux.intel.com,
 luto@...nel.org, mingo@...hat.com, bp@...en8.de, peterz@...radead.org,
 nadav.amit@...il.com
Subject: [PATCH] x86/mm: fix race between flush_tlb_func and idle task
 leave_mm

There is a tiny race window between flush_tlb_func() and the call to
leave_mm() from cpuidle_enter_state() in the idle task.

The race happens when a CPU goes idle, through enter_lazy_tlb(),
while the process on the CPU is transitioning to a global ASID.

If the TLB flush IPI arrives between the call to enter_lazy_tlb(),
and the CPU actually going idle, the mm_needs_global_asid()
branch in flush_tlb_func() will switch the CPU to the global
ASID, and return with the CPU no longer in lazy TLB mode.

If the system then selects a deeper idle state, the warning in
leave_mm() will trigger.

This race has not been observed with only the INVLPGB code running
on several thousand hosts over several weeks, but it's showing up
several times a minute in my tests with the RAR code.

Avoid the race by moving the .is_lazy test to before the global ASID
test in flush_tlb_func().

Signed-off-by: Rik van Riel <riel@...riel.com>
Fixes: 4afeb0ed1753 x86/mm: Enable broadcast TLB invalidation for multi-threaded processes
Cc: stable@...nel.org
---
 arch/x86/mm/tlb.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3feb6ae2b678..9010bcfdfc20 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1150,6 +1150,20 @@ static void flush_tlb_func(void *info)
 	if (unlikely(loaded_mm == &init_mm))
 		return;
 
+	if (this_cpu_read(cpu_tlbstate_shared.is_lazy)) {
+		/*
+		 * We're in lazy mode.  We need to at least flush our
+		 * paging-structure cache to avoid speculatively reading
+		 * garbage into our TLB.  Since switching to init_mm is barely
+		 * slower than a minimal flush, just switch to init_mm.
+		 *
+		 * This should be rare, with native_flush_tlb_multi() skipping
+		 * IPIs to lazy TLB mode CPUs.
+		 */
+		switch_mm_irqs_off(NULL, &init_mm, NULL);
+		return;
+	}
+
 	/* Reload the ASID if transitioning into or out of a global ASID */
 	if (mm_needs_global_asid(loaded_mm, loaded_mm_asid)) {
 		switch_mm_irqs_off(NULL, loaded_mm, NULL);
@@ -1168,20 +1182,6 @@ static void flush_tlb_func(void *info)
 	VM_WARN_ON(is_dyn_asid(loaded_mm_asid) && loaded_mm->context.ctx_id !=
 		   this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id));
 
-	if (this_cpu_read(cpu_tlbstate_shared.is_lazy)) {
-		/*
-		 * We're in lazy mode.  We need to at least flush our
-		 * paging-structure cache to avoid speculatively reading
-		 * garbage into our TLB.  Since switching to init_mm is barely
-		 * slower than a minimal flush, just switch to init_mm.
-		 *
-		 * This should be rare, with native_flush_tlb_multi() skipping
-		 * IPIs to lazy TLB mode CPUs.
-		 */
-		switch_mm_irqs_off(NULL, &init_mm, NULL);
-		return;
-	}
-
 	if (is_dyn_asid(loaded_mm_asid))
 		local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
 
-- 
2.47.1