linux-kernel - Re: [RFC PATCH 2/3] x86/mm: make sure LAM is up-to-date during context switching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zeos1arCH11X_sXv@google.com>
Date: Thu, 7 Mar 2024 21:08:37 +0000
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, Thomas Gleixner <tglx@...utronix.de>, 
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Peter Zijlstra <peterz@...radead.org>, 
	Andy Lutomirski <luto@...nel.org>, x86@...nel.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/3] x86/mm: make sure LAM is up-to-date during
 context switching

On Thu, Mar 07, 2024 at 09:56:07AM -0800, Dave Hansen wrote:
> On 3/7/24 09:29, Kirill A. Shutemov wrote:
> > On Thu, Mar 07, 2024 at 01:39:15PM +0000, Yosry Ahmed wrote:
> >> During context switching, if we are not switching to new mm and no TLB
> >> flush is needed, we do not write CR3. However, it is possible that a
> >> user thread enables LAM while a kthread is running on a different CPU
> >> with the old LAM CR3 mask. If the kthread context switches into any
> >> thread of that user process, it may not write CR3 with the new LAM mask,
> >> which would cause the user thread to run with a misconfigured CR3 that
> >> disables LAM on the CPU.
> > I don't think it is possible. As I said we can only enable LAM when the
> > process has single thread. If it enables LAM concurrently with kernel
> > thread and kernel thread gets control on the same CPU after the userspace
> > thread of the same process LAM is already going to be enabled. No need in
> > special handling.
> 
> I think it's something logically like this:
> 
> 						// main thread
> 	kthread_use_mm()
> 	cr3 |= mm->lam_cr3_mask;
> 						mm->lam_cr3_mask = foo;
> 	cpu_tlbstate.lam = mm->lam_cr3_mask;

IIUC it doesn't have to be through kthread_use_mm(). If we context
switch directly from the user thread to a kthread, the kthread will keep
using the user thread's mm AFAICT.

> 
> Obviously the kthread's LAM state is going to be random.  It's
> fundamentally racing with the enabling thread.  That part is fine.
> 
> The main pickle is the fact that CR3 and cpu_tlbstate.lam are out of
> sync.  That seems worth fixing.

That's what is fixed by patch 1, specifically a race between
switch_mm_irqs_off() and LAM being enabled. This patch is fixing a
different problem:

CPU 1                                   CPU 2
/* user thread running */
context_switch() /* to kthread */
                                        /* user thread enables LAM */
                                        context_switch()
context_switch() /* to user thread */

In this case, there are no races, but the second context switch on CPU 1
may not write CR3 (if TLB is up-to-date), in which case we will run the
user thread with CR3 having the wrong LAM mask. This could cause bigger
problems, right?

> 
> Or is there something else that keeps this whole thing from racing in
> the first place?

+1 that would be good to know, but I didn't find anything.