[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190903160036.2400-4-mathieu.desnoyers@efficios.com>
Date: Tue, 3 Sep 2019 12:00:36 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Peter Zijlstra <peterz@...radead.org>,
Oleg Nesterov <oleg@...hat.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Russell King - ARM Linux admin <linux@...linux.org.uk>,
Chris Metcalf <cmetcalf@...hip.com>,
Christoph Lameter <cl@...ux.com>,
Kirill Tkhai <tkhai@...dex.ru>, Mike Galbraith <efault@....de>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>
Cc: linux-kernel@...r.kernel.org,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Subject: [RFC PATCH 3/3] Fix: sched/membarrier: use probe_kernel_address to read mm->membarrier_state
membarrier_global_expedited reads each runqueue current task's
mm->membarrier_state to test a flag. A concurrent task exit can cause
the memory backing the struct mm to have been freed before that read.
Therefore, use probe_kernel_address to read that flag. If the read
fails, consider it as if the flag was unset: it means the scheduler code
provides the barriers required by membarrier, and sending an IPI to this
CPU is redundant.
There is ongoing discussion on removing the need to use
probe_kernel_address from this code by providing additional existence
guarantees for struct mm. This patch is submitted as a fix aiming to be
backported to prior stable kernel releases.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Oleg Nesterov <oleg@...hat.com>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Russell King - ARM Linux admin <linux@...linux.org.uk>
Cc: Chris Metcalf <cmetcalf@...hip.com>
Cc: Christoph Lameter <cl@...ux.com>
Cc: Kirill Tkhai <tkhai@...dex.ru>
Cc: Mike Galbraith <efault@....de>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Ingo Molnar <mingo@...nel.org>
---
kernel/sched/membarrier.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 02feb7c8da4f..0312604d8315 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -58,6 +58,8 @@ static int membarrier_global_expedited(void)
cpus_read_lock();
for_each_online_cpu(cpu) {
struct task_struct *p;
+ struct mm_struct *mm;
+ int membarrier_state;
/*
* Skipping the current CPU is OK even through we can be
@@ -72,17 +74,34 @@ static int membarrier_global_expedited(void)
rcu_read_lock();
p = task_rcu_dereference(&cpu_rq(cpu)->curr);
- if (p) {
- struct mm_struct *mm = READ_ONCE(p->mm);
+ if (!p)
+ goto next;
+ mm = READ_ONCE(p->mm);
+ if (!mm)
+ goto next;
- if (mm && (atomic_read(&mm->membarrier_state) &
- MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
- if (!fallback)
- __cpumask_set_cpu(cpu, tmpmask);
- else
- smp_call_function_single(cpu, ipi_mb, NULL, 1);
- }
+ /*
+ * Using probe_kernel_address() of membarrier_state instead of
+ * an atomic_read() to deal with the fact that mm may have been
+ * concurrently freed. If probe_kernel_address fails, it means
+ * the mm struct was freed, so there is no need to issue a
+ * barrier on that particular CPU, because the scheduler code
+ * is taking care of it.
+ *
+ * It does not matter whether probe_kernel_address decides to
+ * read membarrier_state piece-wise, given that we only care
+ * about testing a single bit.
+ */
+ if (probe_kernel_address(&mm->membarrier_state.counter,
+ membarrier_state))
+ membarrier_state = 0;
+ if (membarrier_state & MEMBARRIER_STATE_GLOBAL_EXPEDITED) {
+ if (!fallback)
+ __cpumask_set_cpu(cpu, tmpmask);
+ else
+ smp_call_function_single(cpu, ipi_mb, NULL, 1);
}
+ next:
rcu_read_unlock();
}
if (!fallback) {
--
2.17.1
Powered by blists - more mailing lists