lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250520110632.280908218@infradead.org>
Date: Tue, 20 May 2025 12:55:45 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: x86@...nel.org
Cc: linux-kernel@...r.kernel.org,
 peterz@...radead.org,
 kys@...rosoft.com,
 haiyangz@...rosoft.com,
 wei.liu@...nel.org,
 decui@...rosoft.com,
 tglx@...utronix.de,
 mingo@...hat.com,
 bp@...en8.de,
 dave.hansen@...ux.intel.com,
 hpa@...or.com,
 luto@...nel.org,
 linux-hyperv@...r.kernel.org
Subject: [PATCH 3/3] x86/mm: Clarify should_flush_tlb() ordering

The ordering in should_flush_tlb() is entirely non-obvious and is only
correct because x86 is TSO. Clarify the situation by replacing two
WRITE_ONCE()s with smp_store_release(), which on x86 is cosmetic.

Additionally, clarify the comment on should_flush_tlb().

Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
---
 arch/x86/mm/tlb.c |   30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -910,8 +910,10 @@ void switch_mm_irqs_off(struct mm_struct
 		 * Indicate that CR3 is about to change. nmi_uaccess_okay()
 		 * and others are sensitive to the window where mm_cpumask(),
 		 * CR3 and cpu_tlbstate.loaded_mm are not all in sync.
+		 *
+		 * Also, see should_flush_tlb().
 		 */
-		WRITE_ONCE(this_tlbstate->loaded_mm, LOADED_MM_SWITCHING);
+		smp_store_release(&this_tlbstate->loaded_mm, LOADED_MM_SWITCHING);
 		barrier();
 
 		/* Start receiving IPIs and then read tlb_gen (and LAM below) */
@@ -938,10 +940,11 @@ void switch_mm_irqs_off(struct mm_struct
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0);
 	}
 
-	/* Make sure we write CR3 before loaded_mm. */
-	barrier();
-
-	WRITE_ONCE(this_tlbstate->loaded_mm, next);
+	/*
+	 * Make sure we write CR3 before loaded_mm.
+	 * See nmi_uaccess_okay() and should_flush_tlb().
+	 */
+	smp_store_release(&this_tlbstate->loaded_mm, next);
 	WRITE_ONCE(this_tlbstate->loaded_mm_asid, ns.asid);
 	cpu_tlbstate_update_lam(new_lam, mm_untag_mask(next));
 
@@ -1280,9 +1283,20 @@ static bool should_flush_tlb(int cpu, vo
 	struct flush_tlb_info *info = data;
 
 	/*
-	 * Order the 'loaded_mm' and 'is_lazy' against their
-	 * write ordering in switch_mm_irqs_off(). Ensure
-	 * 'is_lazy' is at least as new as 'loaded_mm'.
+	 * switch_mm_irqs_off()				should_flush_tlb()
+	 *   WRITE_ONCE(is_lazy, false);		  loaded_mm = READ_ONCE(loaded_mm);
+	 *   smp_store_release(loaded_mm, SWITCHING);     smp_rmb();
+	 *   mov-cr3
+	 *   smp_store_release(loaded_mm, next)
+	 *                                                if (READ_ONCE(is_lazy))
+	 *                                                  return false;
+	 *
+	 * Where smp_rmb() matches against either smp_store_release() to
+	 * ensure that if we observe loaded_mm to be either SWITCHING or next
+	 * we must also observe is_lazy == false.
+	 *
+	 * If this were not so, it would be possible to falsely return false
+	 * and miss sending an invalidation IPI.
 	 */
 	smp_rmb();
 



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ