[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250310074044.3656-3-wuyun.abel@bytedance.com>
Date: Mon, 10 Mar 2025 15:40:42 +0800
From: Abel Wu <wuyun.abel@...edance.com>
To: K Prateek Nayak <kprateek.nayak@....com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Josh Don <joshdon@...gle.com>,
Tianchen Ding <dtcccc@...ux.alibaba.com>
Cc: Abel Wu <wuyun.abel@...edance.com>,
linux-kernel@...r.kernel.org (open list:SCHEDULER)
Subject: [RFC PATCH 2/2] sched/fair: Do not specialcase SCHED_IDLE cpus in select slowpath
The SCHED_IDLE cgroups whose cpu.idle equals to 1, only mean something
to their siblings due to cgroup hierarchical behavior. So a SCHED_IDLE
cpu does NOT necessarily implies any of the following:
- It is a less loaded cpu (since the parent of its topmost idle
ancestor could be a 'giant' entity with large cpu.weight).
- It can be expected to be preempted by a newly woken task soon
enough (which actually depends on their ancestors who have
common parent).
As a less loaded cpu probably has better ability to serve the newly
woken task, which also applies to the SCHED_IDLE cpus that less loaded
SCHED_IDLE cpu might be easier and faster preempted, let's not special
case SCHED_IDLE cpus at least in slowpath when selecting.
Signed-off-by: Abel Wu <wuyun.abel@...edance.com>
---
kernel/sched/fair.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 379764bd2795..769505cf519b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7446,7 +7446,7 @@ sched_balance_find_dst_group_cpu(struct sched_group *group, struct task_struct *
unsigned int min_exit_latency = UINT_MAX;
u64 latest_idle_timestamp = 0;
int least_loaded_cpu = this_cpu;
- int shallowest_idle_cpu = -1, si_cpu = -1;
+ int shallowest_idle_cpu = -1;
int i;
/* Check if we have any choice: */
@@ -7481,12 +7481,13 @@ sched_balance_find_dst_group_cpu(struct sched_group *group, struct task_struct *
latest_idle_timestamp = rq->idle_stamp;
shallowest_idle_cpu = i;
}
- } else if (shallowest_idle_cpu == -1 && si_cpu == -1) {
- if (sched_idle_cpu(i)) {
- si_cpu = i;
- continue;
- }
-
+ } else if (shallowest_idle_cpu == -1) {
+ /*
+ * The SCHED_IDLE cpus do not necessarily means anything
+ * to @p due to the cgroup hierarchical behavior. But it
+ * is almost certain that the wakee will get better served
+ * if the cpu is less loaded.
+ */
load = cpu_load(cpu_rq(i));
if (load < min_load) {
min_load = load;
@@ -7495,11 +7496,7 @@ sched_balance_find_dst_group_cpu(struct sched_group *group, struct task_struct *
}
}
- if (shallowest_idle_cpu != -1)
- return shallowest_idle_cpu;
- if (si_cpu != -1)
- return si_cpu;
- return least_loaded_cpu;
+ return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
}
static inline int sched_balance_find_dst_cpu(struct sched_domain *sd, struct task_struct *p,
--
2.37.3
Powered by blists - more mailing lists