[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zeos1arCH11X_sXv@google.com>
Date: Thu, 7 Mar 2024 21:08:37 +0000
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Andrew Morton <akpm@...ux-foundation.org>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>, x86@...nel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/3] x86/mm: make sure LAM is up-to-date during
context switching
On Thu, Mar 07, 2024 at 09:56:07AM -0800, Dave Hansen wrote:
> On 3/7/24 09:29, Kirill A. Shutemov wrote:
> > On Thu, Mar 07, 2024 at 01:39:15PM +0000, Yosry Ahmed wrote:
> >> During context switching, if we are not switching to new mm and no TLB
> >> flush is needed, we do not write CR3. However, it is possible that a
> >> user thread enables LAM while a kthread is running on a different CPU
> >> with the old LAM CR3 mask. If the kthread context switches into any
> >> thread of that user process, it may not write CR3 with the new LAM mask,
> >> which would cause the user thread to run with a misconfigured CR3 that
> >> disables LAM on the CPU.
> > I don't think it is possible. As I said we can only enable LAM when the
> > process has single thread. If it enables LAM concurrently with kernel
> > thread and kernel thread gets control on the same CPU after the userspace
> > thread of the same process LAM is already going to be enabled. No need in
> > special handling.
>
> I think it's something logically like this:
>
> // main thread
> kthread_use_mm()
> cr3 |= mm->lam_cr3_mask;
> mm->lam_cr3_mask = foo;
> cpu_tlbstate.lam = mm->lam_cr3_mask;
IIUC it doesn't have to be through kthread_use_mm(). If we context
switch directly from the user thread to a kthread, the kthread will keep
using the user thread's mm AFAICT.
>
> Obviously the kthread's LAM state is going to be random. It's
> fundamentally racing with the enabling thread. That part is fine.
>
> The main pickle is the fact that CR3 and cpu_tlbstate.lam are out of
> sync. That seems worth fixing.
That's what is fixed by patch 1, specifically a race between
switch_mm_irqs_off() and LAM being enabled. This patch is fixing a
different problem:
CPU 1 CPU 2
/* user thread running */
context_switch() /* to kthread */
/* user thread enables LAM */
context_switch()
context_switch() /* to user thread */
In this case, there are no races, but the second context switch on CPU 1
may not write CR3 (if TLB is up-to-date), in which case we will run the
user thread with CR3 having the wrong LAM mask. This could cause bigger
problems, right?
>
> Or is there something else that keeps this whole thing from racing in
> the first place?
+1 that would be good to know, but I didn't find anything.
Powered by blists - more mailing lists