[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250317082803.3071809-1-joelagnelf@nvidia.com>
Date: Mon, 17 Mar 2025 04:28:02 -0400
From: Joel Fernandes <joelagnelf@...dia.com>
To: linux-kernel@...r.kernel.org,
Tejun Heo <tj@...nel.org>,
David Vernet <void@...ifault.com>,
Andrea Righi <arighi@...dia.com>,
Changwoo Min <changwoo@...lia.com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>
Cc: Joel Fernandes <joelagnelf@...dia.com>
Subject: [PATCH RFC] sched_ext: Choose prev_cpu if idle and cache affine without WF_SYNC
Consider that the previous CPU is cache affined to the waker's CPU and
is idle. Currently, scx's default select function only selects the
previous CPU in this case if WF_SYNC request is also made to wakeup on the
waker's CPU.
This means, without WF_SYNC, the previous CPU being cache affined to the
waker and is idle is not considered. This seems extreme. WF_SYNC is not
normally passed to the wakeup path outside of some IPC drivers but it is
very possible that the task is cache hot on previous CPU and shares
cache with the waker CPU. Lets avoid too many migrations and select the
previous CPU in such cases.
This change is consistent with the fair scheduler's behavior as well. In
select_idle_sibling(), the previous CPU is selected if it is cache
affined with the target. This is done regardless of WF_SYNC and before
any scanning of fully idle cores is done.
One difference still exists though between SCX and CFS in this regard, in CFS
we first check if the target CPU is idle before checking if the previous CPU is
idle. However that could be a matter of choice and in the future, that behavior
could also be unified.
Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
---
kernel/sched/ext.c | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 5a81d9a1e31f..3b1a489e2aaf 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -3479,7 +3479,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
{
const struct cpumask *llc_cpus = NULL;
const struct cpumask *numa_cpus = NULL;
- s32 cpu;
+ s32 cpu = smp_processor_id();
*found = false;
@@ -3507,22 +3507,20 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
llc_cpus = llc_span(prev_cpu);
}
+ /*
+ * If the waker's CPU is cache affine and prev_cpu is idle, then avoid
+ * a migration.
+ */
+ if (cpus_share_cache(cpu, prev_cpu) &&
+ test_and_clear_cpu_idle(prev_cpu)) {
+ cpu = prev_cpu;
+ goto cpu_found;
+ }
+
/*
* If WAKE_SYNC, try to migrate the wakee to the waker's CPU.
*/
if (wake_flags & SCX_WAKE_SYNC) {
- cpu = smp_processor_id();
-
- /*
- * If the waker's CPU is cache affine and prev_cpu is idle,
- * then avoid a migration.
- */
- if (cpus_share_cache(cpu, prev_cpu) &&
- test_and_clear_cpu_idle(prev_cpu)) {
- cpu = prev_cpu;
- goto cpu_found;
- }
-
/*
* If the waker's local DSQ is empty, and the system is under
* utilized, try to wake up @p to the local DSQ of the waker.
--
2.43.0
Powered by blists - more mailing lists