lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20251017-b4-sched-cfs-refactor-propagate-v1-1-1eb0dc5b19b3@os.amperecomputing.com>
Date: Fri, 17 Oct 2025 16:00:44 -0700
From: Shubhang Kaushik via B4 Relay <devnull+shubhang.os.amperecomputing.com@...nel.org>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
 Juri Lelli <juri.lelli@...hat.com>, 
 Vincent Guittot <vincent.guittot@...aro.org>, 
 Dietmar Eggemann <dietmar.eggemann@....com>, 
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, 
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, 
 Shubhang Kaushik <sh@...two.org>, 
 Shijie Huang <Shijie.Huang@...erecomputing.com>, 
 Frank Wang <zwang@...erecomputing.com>
Cc: Christopher Lameter <cl@...two.org>, 
 Adam Li <adam.li@...erecomputing.com>, linux-kernel@...r.kernel.org, 
 Shubhang Kaushik <shubhang@...amperecomputing.com>
Subject: [PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup

From: Shubhang Kaushik <shubhang@...amperecomputing.com>

Modify the wakeup path in `select_task_rq_fair()` to prioritize cache
locality for waking tasks. The previous fast path always attempted to
find an idle sibling, even if the task's prev CPU was not truly busy.

The original problem was that under some circumstances, this could lead
to unnecessary task migrations away from a cache-hot core, even when
the task's prev CPU was a suitable candidate. The scheduler's internal
mechanism `cpu_overutilized()` provide an evaluation of CPU load.

To address this, the wakeup heuristic is updated to check the status of
the task's `prev_cpu` first:
- If the `prev_cpu` is  not overutilized (as determined by
  `cpu_overutilized()`, via PELT), the task is woken up on
  its previous CPU. This leverages cache locality and avoids
  a potentially unnecessary migration.
- If the `prev_cpu` is considered busy or overutilized, the scheduler
  falls back to the existing behavior of searching for an idle sibling.

Signed-off-by: Shubhang Kaushik <shubhang@...amperecomputing.com>
---
This patch optimizes the scheduler's wakeup path to prioritize cache 
locality by keeping a task on its previous CPU if it is not overutilized,
falling back to a sibling search only when necessary.
---
 kernel/sched/fair.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bc0b7ce8a65d6bbe616953f530f7a02bb619537c..bb0d28d7d9872642cb5a4076caeb3ac9d8fe7bcd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8618,7 +8618,16 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 		new_cpu = sched_balance_find_dst_cpu(sd, p, cpu, prev_cpu, sd_flag);
 	} else if (wake_flags & WF_TTWU) { /* XXX always ? */
 		/* Fast path */
-		new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
+
+		/*
+		 * Avoid wakeup on an overutilized CPU.
+		 * If the previous CPU is not overloaded, retain the same for cache locality.
+		 * Otherwise, search for an idle sibling.
+		 */
+		if (!cpu_overutilized(prev_cpu))
+			new_cpu = prev_cpu;
+		else
+			new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
 	}
 	rcu_read_unlock();
 

---
base-commit: 9b332cece987ee1790b2ed4c989e28162fa47860
change-id: 20251017-b4-sched-cfs-refactor-propagate-2c4a820998a4

Best regards,
-- 
Shubhang Kaushik <shubhang@...amperecomputing.com>



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ