[<prev] [next>] [day] [month] [year] [list]
Message-ID: <159540914264.4006.8441424283734137782.tip-bot2@tip-bot2>
Date: Wed, 22 Jul 2020 09:12:22 -0000
From: "tip-bot2 for Peter Zijlstra" <tip-bot2@...utronix.de>
To: linux-tip-commits@...r.kernel.org
Cc: Jiri Slaby <jirislaby@...nel.org>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Oleg Nesterov <oleg@...hat.com>,
Paul Gortmaker <paul.gortmaker@...driver.com>,
Christian Brauner <christian.brauner@...ntu.com>,
x86 <x86@...nel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: [tip: sched/urgent] sched: Fix race against ptrace_freeze_trace()
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: d136122f58458479fd8926020ba2937de61d7f65
Gitweb: https://git.kernel.org/tip/d136122f58458479fd8926020ba2937de61d7f65
Author: Peter Zijlstra <peterz@...radead.org>
AuthorDate: Mon, 20 Jul 2020 17:20:21 +02:00
Committer: Peter Zijlstra <peterz@...radead.org>
CommitterDate: Wed, 22 Jul 2020 10:22:00 +02:00
sched: Fix race against ptrace_freeze_trace()
There is apparently one site that violates the rule that only current
and ttwu() will modify task->state, namely ptrace_{,un}freeze_traced()
will change task->state for a remote task.
Oleg explains:
"TASK_TRACED/TASK_STOPPED was always protected by siglock. In
particular, ttwu(__TASK_TRACED) must be always called with siglock
held. That is why ptrace_freeze_traced() assumes it can safely do
s/TASK_TRACED/__TASK_TRACED/ under spin_lock(siglock)."
This breaks the ordering scheme introduced by commit:
dbfb089d360b ("sched: Fix loadavg accounting race")
Specifically, the reload not matching no longer implies we don't have
to block.
Simply things by noting that what we need is a LOAD->STORE ordering
and this can be provided by a control dependency.
So replace:
prev_state = prev->state;
raw_spin_lock(&rq->lock);
smp_mb__after_spinlock(); /* SMP-MB */
if (... && prev_state && prev_state == prev->state)
deactivate_task();
with:
prev_state = prev->state;
if (... && prev_state) /* CTRL-DEP */
deactivate_task();
Since that already implies the 'prev->state' load must be complete
before allowing the 'prev->on_rq = 0' store to become visible.
Fixes: dbfb089d360b ("sched: Fix loadavg accounting race")
Reported-by: Jiri Slaby <jirislaby@...nel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Acked-by: Oleg Nesterov <oleg@...hat.com>
Tested-by: Paul Gortmaker <paul.gortmaker@...driver.com>
Tested-by: Christian Brauner <christian.brauner@...ntu.com>
---
kernel/sched/core.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e15543c..5dece9b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4119,9 +4119,6 @@ static void __sched notrace __schedule(bool preempt)
local_irq_disable();
rcu_note_context_switch(preempt);
- /* See deactivate_task() below. */
- prev_state = prev->state;
-
/*
* Make sure that signal_pending_state()->signal_pending() below
* can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
@@ -4145,11 +4142,16 @@ static void __sched notrace __schedule(bool preempt)
update_rq_clock(rq);
switch_count = &prev->nivcsw;
+
/*
- * We must re-load prev->state in case ttwu_remote() changed it
- * before we acquired rq->lock.
+ * We must load prev->state once (task_struct::state is volatile), such
+ * that:
+ *
+ * - we form a control dependency vs deactivate_task() below.
+ * - ptrace_{,un}freeze_traced() can change ->state underneath us.
*/
- if (!preempt && prev_state && prev_state == prev->state) {
+ prev_state = prev->state;
+ if (!preempt && prev_state) {
if (signal_pending_state(prev_state, prev)) {
prev->state = TASK_RUNNING;
} else {
@@ -4163,10 +4165,12 @@ static void __sched notrace __schedule(bool preempt)
/*
* __schedule() ttwu()
- * prev_state = prev->state; if (READ_ONCE(p->on_rq) && ...)
- * LOCK rq->lock goto out;
- * smp_mb__after_spinlock(); smp_acquire__after_ctrl_dep();
- * p->on_rq = 0; p->state = TASK_WAKING;
+ * prev_state = prev->state; if (p->on_rq && ...)
+ * if (prev_state) goto out;
+ * p->on_rq = 0; smp_acquire__after_ctrl_dep();
+ * p->state = TASK_WAKING
+ *
+ * Where __schedule() and ttwu() have matching control dependencies.
*
* After this, schedule() must not care about p->state any more.
*/
Powered by blists - more mailing lists