lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230419155012.63901-1-mathieu.desnoyers@efficios.com>
Date:   Wed, 19 Apr 2023 11:50:11 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Aaron Lu <aaron.lu@...el.com>
Subject: [RFC PATCH v9 1/2] sched: Fix: Handle target_cpu offlining in active_load_balance_cpu_stop

Handle scenario where the target cpu is going offline concurrently with
execution of active_load_balance_cpu_stop, which can cause
__sched_core_flip to flip rq->core_enabled without rq lock held, which
can trigger lockdep_assert_rq_held() warnings.

This scenario possibly has other unwanted effects such as migrating
tasks to offline cpus, which may prevent their execution for a long
time until the cpu is brought back online.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Aaron Lu <aaron.lu@...el.com>
---
 kernel/sched/fair.c | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5f6587d94c1d..1c837ba41704 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8744,6 +8744,27 @@ static void attach_one_task(struct rq *rq, struct task_struct *p)
 	rq_unlock(rq, &rf);
 }
 
+/*
+ * try_attach_one_task() -- attaches the task returned from detach_one_task() to
+ * its new rq if the rq is online. Returns false if the rq is not online.
+ */
+static bool try_attach_one_task(struct rq *rq, struct task_struct *p)
+{
+	struct rq_flags rf;
+	bool result = true;
+
+	rq_lock(rq, &rf);
+	if (!rq->online) {
+		result = false;
+		goto unlock;
+	}
+	update_rq_clock(rq);
+	attach_task(rq, p);
+unlock:
+	rq_unlock(rq, &rf);
+	return result;
+}
+
 /*
  * attach_tasks() -- attaches all tasks detached by detach_tasks() to their
  * new rq.
@@ -11048,8 +11069,17 @@ static int active_load_balance_cpu_stop(void *data)
 	busiest_rq->active_balance = 0;
 	rq_unlock(busiest_rq, &rf);
 
-	if (p)
-		attach_one_task(target_rq, p);
+	if (p) {
+		if (!try_attach_one_task(target_rq, p)) {
+			/*
+			 * target_rq was offlined concurrently. There is no
+			 * guarantee that the busiest cpu is still online at
+			 * this point. Fallback on using the CPU on which the
+			 * stopper thread is running as target.
+			 */
+			attach_one_task(this_rq(), p);
+		}
+	}
 
 	local_irq_enable();
 
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ