[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250910095044.278-5-ziqianlu@bytedance.com>
Date: Wed, 10 Sep 2025 17:50:44 +0800
From: Aaron Lu <ziqianlu@...edance.com>
To: Valentin Schneider <vschneid@...hat.com>,
Ben Segall <bsegall@...gle.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Peter Zijlstra <peterz@...radead.org>,
Chengming Zhou <chengming.zhou@...ux.dev>,
Josh Don <joshdon@...gle.com>,
Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Xi Wang <xii@...gle.com>
Cc: linux-kernel@...r.kernel.org,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Mel Gorman <mgorman@...e.de>,
Chuyi Zhou <zhouchuyi@...edance.com>,
Jan Kiszka <jan.kiszka@...mens.com>,
Florian Bezdeka <florian.bezdeka@...mens.com>,
Songtang Liu <liusongtang@...edance.com>,
Chen Yu <yu.c.chen@...el.com>,
Matteo Martelli <matteo.martelli@...ethink.co.uk>,
Michal Koutný <mkoutny@...e.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: [PATCH 4/4] sched/fair: Do not balance task to a throttled cfs_rq
When doing load balance and the target cfs_rq is in throttled hierarchy,
whether to allow balancing there is a question.
The good side to allow balancing is: if the target CPU is idle or less
loaded and the being balanced task is holding some kernel resources,
then it seems a good idea to balance the task there and let the task get
the CPU earlier and release kernel resources sooner. The bad part is, if
the task is not holding any kernel resources, then the balance seems not
that useful.
While theoretically it's debatable, a performance test[0] which involves
200 cgroups and each cgroup runs hackbench(20 sender, 20 receiver) in
pipe mode showed a performance degradation on AMD Genoa when allowing
load balance to throttled cfs_rq. Analysis[1] showed hackbench doesn't
like task migration across LLC boundary. For this reason, add a check in
can_migrate_task() to forbid balancing to a cfs_rq that is in throttled
hierarchy. This reduced task migration a lot and performance restored.
[0]: https://lore.kernel.org/lkml/20250822110701.GB289@bytedance/
[1]: https://lore.kernel.org/lkml/20250903101102.GB42@bytedance/
Signed-off-by: Aaron Lu <ziqianlu@...edance.com>
---
kernel/sched/fair.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3dbdfaa697477..00ee59993b6a3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9369,14 +9369,19 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
/*
* We do not migrate tasks that are:
* 1) delayed dequeued unless we migrate load, or
- * 2) cannot be migrated to this CPU due to cpus_ptr, or
- * 3) running (obviously), or
- * 4) are cache-hot on their current CPU, or
- * 5) are blocked on mutexes (if SCHED_PROXY_EXEC is enabled)
+ * 2) target cfs_rq is in throttled hierarchy, or
+ * 3) cannot be migrated to this CPU due to cpus_ptr, or
+ * 4) running (obviously), or
+ * 5) are cache-hot on their current CPU, or
+ * 6) are blocked on mutexes (if SCHED_PROXY_EXEC is enabled)
*/
if ((p->se.sched_delayed) && (env->migration_type != migrate_load))
return 0;
+ if (task_group(p) &&
+ throttled_hierarchy(task_group(p)->cfs_rq[env->dst_cpu]))
+ return 0;
+
/*
* We want to prioritize the migration of eligible tasks.
* For ineligible tasks we soft-limit them and only allow
--
2.39.5
Powered by blists - more mailing lists