linux-kernel - [RFC PATCH 1/2] sched: Fix exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20200728160010.3314-1-mathieu.desnoyers@efficios.com>
Date:   Tue, 28 Jul 2020 12:00:09 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Will Deacon <will@...nel.org>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Nicholas Piggin <npiggin@...il.com>,
        Andy Lutomirski <luto@...capital.net>,
        Thomas Gleixner <tglx@...utronix.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Alan Stern <stern@...land.harvard.edu>, linux-mm@...ck.org
Subject: [RFC PATCH 1/2] sched: Fix exit_mm vs membarrier

exit_mm should issue memory barriers after user-space memory accesses,
before clearing current->mm, to order user-space memory accesses
performed prior to exit_mm before clearing tsk->mm, which has the
effect of skipping the membarrier private expedited IPIs.

The membarrier system call can be issued concurrently with do_exit
if we have thread groups created with CLONE_VM but not CLONE_THREAD.

The following comment in exit_mm() seems rather puzzling though:

        /* more a memory barrier than a real lock */
        task_lock(current);

So considering that spinlocks are not full memory barriers nowadays,
some digging into the origins of this comment led me to 2002, at the
earliest commit tracked by Thomas Gleixner's bitkeeper era's history
at https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/

At that point, this comment was followed by:

+               /* more a memory barrier than a real lock */
+               task_lock(tsk);
+               tsk->mm = NULL;
+               task_unlock(tsk);

Which seems to indicate that grabbing the lock is really about acting as
a memory barrier, but I wonder if it is meant to be a full barrier or
just an acquire.

If a full memory barrier happens to be needed even without membarrier,
perhaps this fix also corrects other issues ? It unclear from the
comment what this memory barrier orders though. Is it the chain
exit -> waitpid, or is it related to entering lazy TLB ?

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Peter Zijlstra (Intel) <peterz@...radead.org>
Cc: Will Deacon <will@...nel.org>
Cc: Paul E. McKenney <paulmck@...nel.org>
Cc: Nicholas Piggin <npiggin@...il.com>
Cc: Andy Lutomirski <luto@...capital.net>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Alan Stern <stern@...land.harvard.edu>
Cc: linux-mm@...ck.org
---
 kernel/exit.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/exit.c b/kernel/exit.c
index 727150f28103..ce272ec55cdc 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -474,6 +474,14 @@ static void exit_mm(void)
 	BUG_ON(mm != current->active_mm);
 	/* more a memory barrier than a real lock */
 	task_lock(current);
+	/*
+	 * When a thread stops operating on an address space, the loop
+	 * in membarrier_{private,global}_expedited() may not observe
+	 * that tsk->mm, and not issue an IPI. Membarrier requires a
+	 * memory barrier after accessing user-space memory, before
+	 * clearing tsk->mm.
+	 */
+	smp_mb();
 	current->mm = NULL;
 	mmap_read_unlock(mm);
 	enter_lazy_tlb(mm, current);
-- 
2.11.0