[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+i-1C16x5u-1qAqDPSONgs+pGWrfUTO1zq2r6Rrvq=q48NHpw@mail.gmail.com>
Date: Tue, 11 Feb 2025 11:02:27 +0100
From: Brendan Jackman <jackmanb@...gle.com>
To: Rik van Riel <riel@...riel.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, bp@...en8.de,
peterz@...radead.org, dave.hansen@...ux.intel.com, zhengqi.arch@...edance.com,
nadav.amit@...il.com, thomas.lendacky@....com, kernel-team@...a.com,
linux-mm@...ck.org, akpm@...ux-foundation.org, jannh@...gle.com,
mhklinux@...look.com, andrew.cooper3@...rix.com,
Manali Shukla <Manali.Shukla@....com>
Subject: Re: [PATCH v9 10/12] x86/mm: do targeted broadcast flushing from
tlbbatch code
On Tue, 11 Feb 2025 at 04:50, Rik van Riel <riel@...riel.com> wrote:
>
> On Mon, 2025-02-10 at 16:27 +0100, Brendan Jackman wrote:
> > On Thu, 6 Feb 2025 at 05:46, Rik van Riel <riel@...riel.com> wrote:
> > > /* Wait for INVLPGB originated by this CPU to complete. */
> > > -static inline void tlbsync(void)
> > > +static inline void __tlbsync(void)
> > > {
> > > - cant_migrate();
> >
> > Why does this have to go away?
>
> I'm not sure the current task in sched_init() has
> all the correct bits set to prevent the warning
> from firing, but on the flip side it won't have
> called INVLPGB yet at that point, so the call to
> enter_lazy_tlb() won't actually end up here.
>
> I'll put it back.
Sounds good.
FWIW I think if we do run into early-boot code hitting false
DEBUG_ATOMIC_SLEEP warnings, the best response might be to update the
DEBUG_ATOMIC_SLEEP code. Like maybe there's a more targeted solution
but something roughly equivalent to checking if (system_state ==
SYSTEM_STATE_SCHEDULING) before the warning.
> > > @@ -794,6 +825,8 @@ void switch_mm_irqs_off(struct mm_struct
> > > *unused, struct mm_struct *next,
> > > if (IS_ENABLED(CONFIG_PROVE_LOCKING))
> > > WARN_ON_ONCE(!irqs_disabled());
> > >
> > > + tlbsync();
> > > +
> > > /*
> > > * Verify that CR3 is what we think it is. This will catch
> > > * hypothetical buggy code that directly switches to
> > > swapper_pg_dir
> > > @@ -973,6 +1006,8 @@ void switch_mm_irqs_off(struct mm_struct
> > > *unused, struct mm_struct *next,
> > > */
> > > void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
> > > {
> > > + tlbsync();
> > > +
> >
> > I have a feeling I'll look stupid for asking this, but why do we need
> > this and the one in switch_mm_irqs_off()?
>
> This is an architectural thing: TLBSYNC waits for
> the INVLPGB flushes to finish that were issued
> from the same CPU.
>
> That means if we have pending flushes (from the
> pageout code), we need to wait for them at context
> switch time, before the task could potentially be
> migrated to another CPU.
Oh right thanks, that makes sense.
So I think here we're encoding the assumption that context_switch()
always calls either enter_lazy_tlb() or switch_mm_irqs_off(), which is
a little awkward, plus the job of these functions is already kinda
hazy and this makes it even hazier. What about doing it in
arch_start_context_switch() instead?
That would mean a bit of plumbing since we'd still wanna have the
tlbsync() in tlb.c, but that seems worth it to me. Plus, having it in
one place would give us a spot to add a comment. Now that you point it
out it does indeed seem obvious but it didn't seem so yesterday.
Now I think about it... if we always tlbsync() before a context
switch, is the cant_migrate() above actually required? I think with
that, even if we migrated in the middle of e.g.
broadcast_kernel_range_flush(), we'd be fine? (At least, from the
specific perspective of the invplgb code, presumably having preemption
on there would break things horribly in other ways).
Powered by blists - more mailing lists