linux-kernel - [tip: x86/alternatives] x86/mm: Fix false positive warning in switch_mm_irqs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <174652454811.406.13340350863311949872.tip-bot2@tip-bot2>
Date: Tue, 06 May 2025 09:42:28 -0000
From: "tip-bot2 for Peter Zijlstra" <tip-bot2@...utronix.de>
To: linux-tip-commits@...r.kernel.org
Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@...el.com>,
 Jani Nikula <jani.nikula@...ux.intel.com>,
 Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>,
 Andrew Cooper <andrew.cooper3@...rix.com>, Andy Lutomirski <luto@...nel.org>,
 Brian Gerst <brgerst@...il.com>, "H. Peter Anvin" <hpa@...or.com>,
 Juergen Gross <jgross@...e.com>,
 Linus Torvalds <torvalds@...ux-foundation.org>,
 Rik van Riel <riel@...riel.com>, x86@...nel.org, linux-kernel@...r.kernel.org
Subject: [tip: x86/alternatives] x86/mm: Fix false positive warning in
 switch_mm_irqs_off()

The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID:     7f9958230d8a79d474829bee25ec9426397335ce
Gitweb:        https://git.kernel.org/tip/7f9958230d8a79d474829bee25ec9426397335ce
Author:        Peter Zijlstra <peterz@...radead.org>
AuthorDate:    Wed, 30 Apr 2025 10:11:54 +02:00
Committer:     Ingo Molnar <mingo@...nel.org>
CommitterDate: Tue, 06 May 2025 11:28:57 +02:00

x86/mm: Fix false positive warning in switch_mm_irqs_off()

Multiple testers reported the following new warning:

	WARNING: CPU: 0 PID: 0 at arch/x86/mm/tlb.c:795

Which corresponds to:

	if (IS_ENABLED(CONFIG_DEBUG_VM) && WARN_ON_ONCE(prev != &init_mm &&
	    !cpumask_test_cpu(cpu, mm_cpumask(next))))
		cpumask_set_cpu(cpu, mm_cpumask(next));

So the problem is that unuse_temporary_mm() explicitly clears
that bit; and it has to, because otherwise the flush_tlb_mm_range() in
__text_poke() will try sending IPIs, which are not at all needed.

See also:

   https://lore.kernel.org/all/20241113095550.GBZzR3pg-RhJKPDazS@fat_crate.local/

Notably, the whole {,un}use_temporary_mm() thing requires preemption to
be disabled across it with the express purpose of keeping all TLB
nonsense CPU local, such that invalidations can also stay local etc.

However, as a side-effect, we violate this above WARN(), which sorta
makes sense for the normal case, but very much doesn't make sense here.

Change unuse_temporary_mm() to mark the mm_struct such that a further
exception (beyond init_mm) can be grafted, to keep the warning for all
the other cases.

Reported-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@...el.com>
Reported-by: Jani Nikula <jani.nikula@...ux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@...radead.org>
Signed-off-by: Ingo Molnar <mingo@...nel.org>
Cc: Andrew Cooper <andrew.cooper3@...rix.com>
Cc: Andy Lutomirski <luto@...nel.org>
Cc: Brian Gerst <brgerst@...il.com>
Cc: H. Peter Anvin <hpa@...or.com>
Cc: Juergen Gross <jgross@...e.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Rik van Riel <riel@...riel.com>
Link: https://lore.kernel.org/r/20250430081154.GH4439@noisy.programming.kicks-ass.net
---
 arch/x86/include/asm/mmu.h         |  4 ++--
 arch/x86/include/asm/mmu_context.h | 10 ++++++++++
 arch/x86/mm/init.c                 |  3 +++
 arch/x86/mm/tlb.c                  |  3 ++-
 arch/x86/platform/efi/efi_64.c     |  1 +
 5 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 8b8055a..0fe9c56 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -16,6 +16,8 @@
 #define MM_CONTEXT_LOCK_LAM		2
 /* Allow LAM and SVA coexisting */
 #define MM_CONTEXT_FORCE_TAGGED_SVA	3
+/* Tracks mm_cpumask */
+#define MM_CONTEXT_NOTRACK		4
 
 /*
  * x86 has arch-specific MMU state beyond what lives in mm_struct.
@@ -44,9 +46,7 @@ typedef struct {
 	struct ldt_struct	*ldt;
 #endif
 
-#ifdef CONFIG_X86_64
 	unsigned long flags;
-#endif
 
 #ifdef CONFIG_ADDRESS_MASKING
 	/* Active LAM mode:  X86_CR3_LAM_U48 or X86_CR3_LAM_U57 or 0 (disabled) */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index c511f85..73bf3b1 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -247,6 +247,16 @@ static inline bool is_64bit_mm(struct mm_struct *mm)
 }
 #endif
 
+static inline bool is_notrack_mm(struct mm_struct *mm)
+{
+	return test_bit(MM_CONTEXT_NOTRACK, &mm->context.flags);
+}
+
+static inline void set_notrack_mm(struct mm_struct *mm)
+{
+	set_bit(MM_CONTEXT_NOTRACK, &mm->context.flags);
+}
+
 /*
  * We only want to enforce protection keys on the current process
  * because we effectively have no access to PKRU for other
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index f8c74d1..aa56d9a 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -28,6 +28,7 @@
 #include <asm/text-patching.h>
 #include <asm/memtype.h>
 #include <asm/paravirt.h>
+#include <asm/mmu_context.h>
 
 /*
  * We need to define the tracepoints somewhere, and tlb.c
@@ -830,6 +831,8 @@ void __init poking_init(void)
 	/* Xen PV guests need the PGD to be pinned. */
 	paravirt_enter_mmap(text_poke_mm);
 
+	set_notrack_mm(text_poke_mm);
+
 	/*
 	 * Randomize the poking address, but make sure that the following page
 	 * will be mapped at the same PMD. We need 2 pages, so find space for 3,
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 39761c7..f5b990e 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -847,7 +847,8 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
 		 * mm_cpumask. The TLB shootdown code can figure out from
 		 * cpu_tlbstate_shared.is_lazy whether or not to send an IPI.
 		 */
-		if (IS_ENABLED(CONFIG_DEBUG_VM) && WARN_ON_ONCE(prev != &init_mm &&
+		if (IS_ENABLED(CONFIG_DEBUG_VM) &&
+		    WARN_ON_ONCE(prev != &init_mm && !is_notrack_mm(prev) &&
 				 !cpumask_test_cpu(cpu, mm_cpumask(next))))
 			cpumask_set_cpu(cpu, mm_cpumask(next));
 
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index a5d3496..ce4c08a 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -89,6 +89,7 @@ int __init efi_alloc_page_tables(void)
 	efi_mm.pgd = efi_pgd;
 	mm_init_cpumask(&efi_mm);
 	init_new_context(NULL, &efi_mm);
+	set_notrack_mm(&efi_mm);
 
 	return 0;