linux-kernel - Re: [PATCH v3 10/19] sched: Fix migrate_disable() vs set_cpus_allowed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <jhjlfg6qqum.mognet@arm.com>
Date:   Fri, 16 Oct 2020 13:48:17 +0100
From:   Valentin Schneider <valentin.schneider@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     tglx@...utronix.de, mingo@...nel.org, linux-kernel@...r.kernel.org,
        bigeasy@...utronix.de, qais.yousef@....com, swood@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vincent.donnefort@....com,
        tj@...nel.org, ouwen210@...mail.com
Subject: Re: [PATCH v3 10/19] sched: Fix migrate_disable() vs set_cpus_allowed_ptr()


On 15/10/20 12:05, Peter Zijlstra wrote:
> @@ -1862,15 +1875,27 @@ static int migration_cpu_stop(void *data
>        * we're holding p->pi_lock.
>        */
>       if (task_rq(p) == rq) {
> +		if (is_migration_disabled(p))
> +			goto out;
> +
>               if (task_on_rq_queued(p))
>                       rq = __migrate_task(rq, &rf, p, arg->dest_cpu);
>               else
>                       p->wake_cpu = arg->dest_cpu;
> +
> +		if (arg->done) {
> +			p->migration_pending = NULL;
> +			complete = true;

Ok so nasty ahead:

P0@...0             P1                    P2                      stopper

migrate_disable();
                   sca(P0, {CPU1});
                     <installs pending>
migrate_enable();
  <kicks stopper>
                                          sca(P0, {CPU0});
                                             <locks>
                                             <local, has pending:
                                              goto do_complete>
                                             <unlocks>
                                             complete_all();
                                             refcount_dec();
                     refcount_dec();
                   <done>
                                           <done>

                                                                  <locks>
                                                                  <fiddles with pending->arg->done>

First, P2 can clear p->migration_pending before the stopper gets to run.

Second, the complete_all() is done without pi / rq locks held, but P2 might
get to it before the stopper does. This may cause &pending to be popped off
the stack before the stopper gets to it, so mayhaps we would need the below
hunk.

The move_queued_task() from the stopper is "safe" in that we won't kick a
task outside of its allowed mask, although we may move it around for no
reason - I tried to prevent that.

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a5b6eac07adb..1ebf653c2c2f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1859,6 +1859,13 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
 	return rq;
 }
 
+struct set_affinity_pending {
+	refcount_t		refs;
+	struct completion	done;
+	struct cpu_stop_work	stop_work;
+	struct migration_arg	arg;
+};
+
 /*
  * migration_cpu_stop - this will be executed by a highprio stopper thread
  * and performs thread migration by bumping thread off CPU then
@@ -1866,6 +1873,7 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
  */
 static int migration_cpu_stop(void *data)
 {
+	struct set_affinity_pending *pending;
 	struct migration_arg *arg = data;
 	struct task_struct *p = arg->task;
 	struct rq *rq = this_rq();
@@ -1886,13 +1894,22 @@ static int migration_cpu_stop(void *data)
 
 	raw_spin_lock(&p->pi_lock);
 	rq_lock(rq, &rf);
+
+	if (arg->done)
+		pending = container_of(arg->done, struct set_affinity_pending, done);
 	/*
 	 * If task_rq(p) != rq, it cannot be migrated here, because we're
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
 	 * we're holding p->pi_lock.
 	 */
 	if (task_rq(p) == rq) {
-		if (is_migration_disabled(p))
+		/*
+		 * An affinity update may have raced with us.
+		 * p->migration_pending could now be NULL, or could be pointing
+		 * elsewhere entirely.
+		 */
+		if (is_migration_disabled(p) ||
+		    (arg->done && p->migration_pending != pending))
 			goto out;
 
 		if (task_on_rq_queued(p))
@@ -2024,13 +2041,6 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 	__do_set_cpus_allowed(p, new_mask, 0);
 }
 
-struct set_affinity_pending {
-	refcount_t		refs;
-	struct completion	done;
-	struct cpu_stop_work	stop_work;
-	struct migration_arg	arg;
-};
-
 /*
  * This function is wildly self concurrent; here be dragons.
  *