lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 01 Jul 2024 07:06:47 -0000
From: "tip-bot2 for Josh Don" <tip-bot2@...utronix.de>
To: linux-tip-commits@...r.kernel.org
Cc: Josh Don <joshdon@...gle.com>,
 "Peter Zijlstra (Intel)" <peterz@...radead.org>,
 Vincent Guittot <vincent.guittot@...aro.org>, x86@...nel.org,
 linux-kernel@...r.kernel.org
Subject: [tip: sched/urgent] Revert "sched/fair: Make sure to try to detach at
 least one movable task"

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     05f8404bcdf31822874bff41992336150240c109
Gitweb:        https://git.kernel.org/tip/05f8404bcdf31822874bff41992336150240c109
Author:        Josh Don <joshdon@...gle.com>
AuthorDate:    Thu, 20 Jun 2024 14:44:50 -07:00
Committer:     Peter Zijlstra <peterz@...radead.org>
CommitterDate: Tue, 25 Jun 2024 10:43:41 +02:00

Revert "sched/fair: Make sure to try to detach at least one movable task"

This reverts commit b0defa7ae03ecf91b8bfd10ede430cff12fcbd06.

b0defa7ae03ec changed the load balancing logic to ignore env.max_loop if
all tasks examined to that point were pinned. The goal of the patch was
to make it more likely to be able to detach a task buried in a long list
of pinned tasks. However, this has the unfortunate side effect of
creating an O(n) iteration in detach_tasks(), as we now must fully
iterate every task on a cpu if all or most are pinned. Since this load
balance code is done with rq lock held, and often in softirq context, it
is very easy to trigger hard lockups. We observed such hard lockups with
a user who affined O(10k) threads to a single cpu.

When I discussed this with Vincent he initially suggested that we keep
the limit on the number of tasks to detach, but increase the number of
tasks we can search. However, after some back and forth on the mailing
list, he recommended we instead revert the original patch, as it seems
likely no one was actually getting hit by the original issue.

Fixes: b0defa7ae03e ("sched/fair: Make sure to try to detach at least one movable task")
Signed-off-by: Josh Don <joshdon@...gle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@...aro.org>
Link: https://lore.kernel.org/r/20240620214450.316280-1-joshdon@google.com
---
 kernel/sched/fair.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a5b1ae..24dda70 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9149,12 +9149,8 @@ static int detach_tasks(struct lb_env *env)
 			break;
 
 		env->loop++;
-		/*
-		 * We've more or less seen every task there is, call it quits
-		 * unless we haven't found any movable task yet.
-		 */
-		if (env->loop > env->loop_max &&
-		    !(env->flags & LBF_ALL_PINNED))
+		/* We've more or less seen every task there is, call it quits */
+		if (env->loop > env->loop_max)
 			break;
 
 		/* take a breather every nr_migrate tasks */
@@ -11393,9 +11389,7 @@ more_balance:
 
 		if (env.flags & LBF_NEED_BREAK) {
 			env.flags &= ~LBF_NEED_BREAK;
-			/* Stop if we tried all running tasks */
-			if (env.loop < busiest->nr_running)
-				goto more_balance;
+			goto more_balance;
 		}
 
 		/*

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ