[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250904041516.3046-17-kprateek.nayak@amd.com>
Date: Thu, 4 Sep 2025 04:15:12 +0000
From: K Prateek Nayak <kprateek.nayak@....com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Anna-Maria Behnsen <anna-maria@...utronix.de>,
Frederic Weisbecker <frederic@...nel.org>, Thomas Gleixner
<tglx@...utronix.de>, <linux-kernel@...r.kernel.org>
CC: Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, K Prateek Nayak
<kprateek.nayak@....com>, "Gautham R. Shenoy" <gautham.shenoy@....com>,
Swapnil Sapkal <swapnil.sapkal@....com>
Subject: [RFC PATCH 16/19] sched/fair: Convert sched_balance_nohz_idle() to use nohz_shared_list
Convert the main nohz idle load balancing loop in
sched_balance_nohz_idle() to use the distributed nohz idle tracking
mechanism via "nohz_shared_list".
The nifty trick to balance the nohz owner at the very end using
for_each_cpu_wrap() is lost during this transition. Special care is
taken to ensure nohz.{needs_update,has_blocked} are set correctly for a
reattempt if the balance_cpu turns bust towards the end of nohz
balancing preserving the current behavior.
Signed-off-by: K Prateek Nayak <kprateek.nayak@....com>
---
kernel/sched/fair.c | 62 ++++++++++++++++++++++++++++++++++-----------
1 file changed, 47 insertions(+), 15 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d309cb73d428..c7ac8e7094ed 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12685,27 +12685,59 @@ static int sched_balance_nohz_idle(int balancing_cpu, unsigned int flags, unsign
{
/* Earliest time when we have to do rebalance again */
unsigned long next_balance = start + 60*HZ;
+ struct sched_domain_shared *sds;
unsigned int update_flags = 0;
- int target_cpu;
- /*
- * Start with the next CPU after the balancing CPU so we will end with
- * balancing CPU and let a chance for other idle cpu to pull load.
- */
- for_each_cpu_wrap(target_cpu, nohz.idle_cpus_mask, balancing_cpu + 1) {
- if (!idle_cpu(target_cpu))
+ rcu_read_lock();
+ list_for_each_entry_rcu(sds, &nohz_shared_list, nohz_list_node) {
+ int target_cpu;
+
+ /* No idle CPUs in this domain */
+ if (!atomic_read(&sds->nr_idle_cpus))
continue;
- /*
- * If balancing CPU gets work to do, stop the load balancing
- * work being done for other CPUs. Next load balancing owner
- * will pick it up.
- */
- if (!idle_cpu(balancing_cpu) && need_resched())
- return -EBUSY;
+ for_each_cpu(target_cpu, sds->idle_cpus_mask) {
+ /* Deal with the balancing CPU at the end. */
+ if (balancing_cpu == target_cpu)
+ continue;
+
+ if (!idle_cpu(target_cpu))
+ continue;
- update_flags |= sched_balance_idle_rq(cpu_rq(target_cpu), flags, &next_balance);
+ /*
+ * If balancing CPU gets work to do, stop the load balancing
+ * work being done for other CPUs. Next load balancing owner
+ * will pick it up.
+ */
+ if (!idle_cpu(balancing_cpu) && need_resched()) {
+ rcu_read_unlock();
+ return -EBUSY;
+ }
+
+ update_flags |= sched_balance_idle_rq(cpu_rq(target_cpu),
+ flags, &next_balance);
+ }
}
+ rcu_read_unlock();
+
+ /*
+ * If we reach here, all CPUs have been balance and it is time
+ * to balance the balancing_cpu.
+ *
+ * If coincidentally the balancing CPU turns busy at this point
+ * and is the only nohz idle CPU, we still need to set
+ * nohz.{needs_update,has_blocked} since the CPU can transition
+ * back to nohz idle before the tick hits.
+ *
+ * In the above case, rq->nohz_tick_stopped is never cleared and
+ * nohz_balance_enter_idle() skips setting nohz.has_blocked.
+ * Return -EBUSY instructing the caller to reset the nohz
+ * signals allowing a reattempt.
+ */
+ if (!idle_cpu(balancing_cpu) && need_resched())
+ return -EBUSY;
+
+ update_flags |= sched_balance_idle_rq(cpu_rq(balancing_cpu), flags, &next_balance);
/*
* next_balance will be updated only when there is a need.
--
2.34.1
Powered by blists - more mailing lists