[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20251017-b4-sched-cfs-refactor-propagate-v1-1-1eb0dc5b19b3@os.amperecomputing.com>
Date: Fri, 17 Oct 2025 16:00:44 -0700
From: Shubhang Kaushik via B4 Relay <devnull+shubhang.os.amperecomputing.com@...nel.org>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Shubhang Kaushik <sh@...two.org>,
Shijie Huang <Shijie.Huang@...erecomputing.com>,
Frank Wang <zwang@...erecomputing.com>
Cc: Christopher Lameter <cl@...two.org>,
Adam Li <adam.li@...erecomputing.com>, linux-kernel@...r.kernel.org,
Shubhang Kaushik <shubhang@...amperecomputing.com>
Subject: [PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup
From: Shubhang Kaushik <shubhang@...amperecomputing.com>
Modify the wakeup path in `select_task_rq_fair()` to prioritize cache
locality for waking tasks. The previous fast path always attempted to
find an idle sibling, even if the task's prev CPU was not truly busy.
The original problem was that under some circumstances, this could lead
to unnecessary task migrations away from a cache-hot core, even when
the task's prev CPU was a suitable candidate. The scheduler's internal
mechanism `cpu_overutilized()` provide an evaluation of CPU load.
To address this, the wakeup heuristic is updated to check the status of
the task's `prev_cpu` first:
- If the `prev_cpu` is not overutilized (as determined by
`cpu_overutilized()`, via PELT), the task is woken up on
its previous CPU. This leverages cache locality and avoids
a potentially unnecessary migration.
- If the `prev_cpu` is considered busy or overutilized, the scheduler
falls back to the existing behavior of searching for an idle sibling.
Signed-off-by: Shubhang Kaushik <shubhang@...amperecomputing.com>
---
This patch optimizes the scheduler's wakeup path to prioritize cache
locality by keeping a task on its previous CPU if it is not overutilized,
falling back to a sibling search only when necessary.
---
kernel/sched/fair.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bc0b7ce8a65d6bbe616953f530f7a02bb619537c..bb0d28d7d9872642cb5a4076caeb3ac9d8fe7bcd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8618,7 +8618,16 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
new_cpu = sched_balance_find_dst_cpu(sd, p, cpu, prev_cpu, sd_flag);
} else if (wake_flags & WF_TTWU) { /* XXX always ? */
/* Fast path */
- new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
+
+ /*
+ * Avoid wakeup on an overutilized CPU.
+ * If the previous CPU is not overloaded, retain the same for cache locality.
+ * Otherwise, search for an idle sibling.
+ */
+ if (!cpu_overutilized(prev_cpu))
+ new_cpu = prev_cpu;
+ else
+ new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
}
rcu_read_unlock();
---
base-commit: 9b332cece987ee1790b2ed4c989e28162fa47860
change-id: 20251017-b4-sched-cfs-refactor-propagate-2c4a820998a4
Best regards,
--
Shubhang Kaushik <shubhang@...amperecomputing.com>
Powered by blists - more mailing lists