linux-kernel - Re: [RFC v2 1/2] [NEEDS HELP] x86/mm: Handle unlazying membarrier core sync in the arch code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <A61977A7-F0B2-4492-AB6D-06E24417FA59@gmail.com>
Date:   Fri, 4 Dec 2020 00:17:29 -0800
From:   Nadav Amit <nadav.amit@...il.com>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Nicholas Piggin <npiggin@...il.com>,
        Anton Blanchard <anton@...abs.org>,
        Arnd Bergmann <arnd@...db.de>,
        linux-arch <linux-arch@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        X86 ML <x86@...nel.org>, Will Deacon <will@...nel.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Rik van Riel <riel@...riel.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Jann Horn <jannh@...gle.com>
Subject: Re: [RFC v2 1/2] [NEEDS HELP] x86/mm: Handle unlazying membarrier
 core sync in the arch code

I am not very familiar with membarrier, but here are my 2 cents while trying
to answer your questions.

> On Dec 3, 2020, at 9:26 PM, Andy Lutomirski <luto@...nel.org> wrote:
> @@ -496,6 +497,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> 		 * from one thread in a process to another thread in the same
> 		 * process. No TLB flush required.
> 		 */
> +
> +		// XXX: why is this okay wrt membarrier?
> 		if (!was_lazy)
> 			return;

I am confused.

On one hand, it seems that membarrier_private_expedited() would issue an IPI
to that core, as it would find that this core’s cpu_rq(cpu)->curr->mm is the
same as the one that the membarrier applies to. But… (see below)


> @@ -508,12 +511,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> 		smp_mb();
> 		next_tlb_gen = atomic64_read(&next->context.tlb_gen);
> 		if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
> -				next_tlb_gen)
> +		    next_tlb_gen) {
> +			/*
> +			 * We're reactivating an mm, and membarrier might
> +			 * need to serialize.  Tell membarrier.
> +			 */
> +
> +			// XXX: I can't understand the logic in
> +			// membarrier_mm_sync_core_before_usermode().  What's
> +			// the mm check for?
> +			membarrier_mm_sync_core_before_usermode(next);

On the other hand the reason for this mm check that you mention contradicts
my previous understanding as the git log says:

commit 2840cf02fae627860156737e83326df354ee4ec6
Author: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Date:   Thu Sep 19 13:37:01 2019 -0400

    sched/membarrier: Call sync_core only before usermode for same mm
    
    When the prev and next task's mm change, switch_mm() provides the core
    serializing guarantees before returning to usermode. The only case
    where an explicit core serialization is needed is when the scheduler
    keeps the same mm for prev and next.

> 	/*
> 	 * When switching through a kernel thread, the loop in
> 	 * membarrier_{private,global}_expedited() may have observed that
> 	 * kernel thread and not issued an IPI. It is therefore possible to
> 	 * schedule between user->kernel->user threads without passing though
> 	 * switch_mm(). Membarrier requires a barrier after storing to
> -	 * rq->curr, before returning to userspace, so provide them here:
> +	 * rq->curr, before returning to userspace, and mmdrop() provides
> +	 * this barrier.
> 	 *
> -	 * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly
> -	 *   provided by mmdrop(),
> -	 * - a sync_core for SYNC_CORE.
> +	 * XXX: I don't think mmdrop() actually does this.  There's no
> +	 * smp_mb__before/after_atomic() in there.

I presume that since x86 is the only one that needs
membarrier_mm_sync_core_before_usermode(), nobody noticed the missing
smp_mb__before/after_atomic(). These are anyhow a compiler barrier in x86,
and such a barrier would take place before the return to userspace.