[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250430081154.GH4439@noisy.programming.kicks-ass.net>
Date: Wed, 30 Apr 2025 10:11:54 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Hugh Dickins <hughd@...gle.com>
Cc: Jani Nikula <jani.nikula@...ux.intel.com>,
"Borah, Chaitanya Kumar" <chaitanya.kumar.borah@...el.com>,
"luto@...nel.org" <luto@...nel.org>,
"intel-gfx@...ts.freedesktop.org" <intel-gfx@...ts.freedesktop.org>,
"intel-xe@...ts.freedesktop.org" <intel-xe@...ts.freedesktop.org>,
"Kurmi, Suresh Kumar" <suresh.kumar.kurmi@...el.com>,
"Saarinen, Jani" <jani.saarinen@...el.com>,
"De Marchi, Lucas" <lucas.demarchi@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>, riel@...riel.com
Subject: Re: [REGRESSION] x86/efi: Make efi_enter/leave_mm() use the
use_/unuse_temporary_mm() machinery (linux-next)
On Tue, Apr 29, 2025 at 11:07:45PM -0700, Hugh Dickins wrote:
> On Tue, 29 Apr 2025, Peter Zijlstra wrote:
> > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> > index 79c124f6f3f2..39761c7765bd 100644
> > --- a/arch/x86/mm/tlb.c
> > +++ b/arch/x86/mm/tlb.c
> > @@ -986,6 +986,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
> > struct mm_struct *prev_mm;
> >
> > lockdep_assert_preemption_disabled();
> > + guard(irqsave)();
> >
> > /*
> > * Make sure not to be in TLB lazy mode, as otherwise we'll end up
> > @@ -1018,6 +1019,7 @@ struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
> > void unuse_temporary_mm(struct mm_struct *prev_mm)
> > {
> > lockdep_assert_preemption_disabled();
> > + guard(irqsave)();
> >
> > /* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
> > cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
>
> Hi Peter, I haven't checked on most recent -nexts, but earlier found that
> patch to be not quite enough, at least if you have CONFIG_DEBUG_VM=y:
> because switch_mm_irqs_off() contains a
>
> VM_WARN_ON_ONCE(prev != &init_mm && !cpumask_test_cpu(cpu,
> mm_cpumask(prev)));
>
> which doesn't like what (un)use_temporary_mm() is now doing. I couldn't
> be sure who was right or wrong, and just proceeded by commenting out
> the warning - ONCE shouldn't be much trouble, except xfstests uses
> some nefarious mechanism to resurrect ONCE repeatedly.
Oh that one. Yeah, I thought Ingo had already delete that WARN, but it
seems it's still there.
So the problem is that unuse_temporary_mm() explicitly clears that bit;
and it has to, because otherwise the flush_tlb_mm_range() in
__text_poke() will try sending IPIs, which are not at all needed.
(See also:
https://lore.kernel.org/all/20241113095550.GBZzR3pg-RhJKPDazS@fat_crate.local/
)
Notably, the whole {,un}use_temporary_mm() thing requires preemption to
be disabled across it with the express purpose of keeping all TLB
nonsense CPU local, such that invalidations can also stay local etc.
However, as a side-effect, we violate this above WARN(), which sorta
makes sense for the normal case, but very much doesn't make sense here.
There are two ways out, one have unuse_temporary_mm() mark the mm_struct
such that a further exception (beyond init_mm) can be grafted, or simply
delete the whole check.
Anyway, something like the below, or just delete the check I suppose.
Opinions?
---
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 8b8055a8eb9e..0fe9c569d171 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -16,6 +16,8 @@
#define MM_CONTEXT_LOCK_LAM 2
/* Allow LAM and SVA coexisting */
#define MM_CONTEXT_FORCE_TAGGED_SVA 3
+/* Tracks mm_cpumask */
+#define MM_CONTEXT_NOTRACK 4
/*
* x86 has arch-specific MMU state beyond what lives in mm_struct.
@@ -44,9 +46,7 @@ typedef struct {
struct ldt_struct *ldt;
#endif
-#ifdef CONFIG_X86_64
unsigned long flags;
-#endif
#ifdef CONFIG_ADDRESS_MASKING
/* Active LAM mode: X86_CR3_LAM_U48 or X86_CR3_LAM_U57 or 0 (disabled) */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index c511f8584ae4..73bf3b1b44e8 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -247,6 +247,16 @@ static inline bool is_64bit_mm(struct mm_struct *mm)
}
#endif
+static inline bool is_notrack_mm(struct mm_struct *mm)
+{
+ return test_bit(MM_CONTEXT_NOTRACK, &mm->context.flags);
+}
+
+static inline void set_notrack_mm(struct mm_struct *mm)
+{
+ set_bit(MM_CONTEXT_NOTRACK, &mm->context.flags);
+}
+
/*
* We only want to enforce protection keys on the current process
* because we effectively have no access to PKRU for other
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index f8c74d19bebb..aa56d9ac0b8f 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -28,6 +28,7 @@
#include <asm/text-patching.h>
#include <asm/memtype.h>
#include <asm/paravirt.h>
+#include <asm/mmu_context.h>
/*
* We need to define the tracepoints somewhere, and tlb.c
@@ -830,6 +831,8 @@ void __init poking_init(void)
/* Xen PV guests need the PGD to be pinned. */
paravirt_enter_mmap(text_poke_mm);
+ set_notrack_mm(text_poke_mm);
+
/*
* Randomize the poking address, but make sure that the following page
* will be mapped at the same PMD. We need 2 pages, so find space for 3,
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 1451e022129a..25bfc3305158 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -852,7 +852,8 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
* mm_cpumask. The TLB shootdown code can figure out from
* cpu_tlbstate_shared.is_lazy whether or not to send an IPI.
*/
- if (IS_ENABLED(CONFIG_DEBUG_VM) && WARN_ON_ONCE(prev != &init_mm &&
+ if (IS_ENABLED(CONFIG_DEBUG_VM) &&
+ WARN_ON_ONCE(prev != &init_mm && !is_notrack_mm(prev) &&
!cpumask_test_cpu(cpu, mm_cpumask(next))))
cpumask_set_cpu(cpu, mm_cpumask(next));
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 8e1796dd6c68..e7e8f77f77f8 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -89,6 +89,7 @@ int __init efi_alloc_page_tables(void)
efi_mm.pgd = efi_pgd;
mm_init_cpumask(&efi_mm);
init_new_context(NULL, &efi_mm);
+ set_notrack_mm(&efi_mm);
return 0;
Powered by blists - more mailing lists