[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251028104255.1892485-1-srikar@linux.ibm.com>
Date: Tue, 28 Oct 2025 16:12:54 +0530
From: Srikar Dronamraju <srikar@...ux.ibm.com>
To: linux-kernel@...r.kernel.org
Cc: Michael Ellerman <mpe@...erman.id.au>,
Madhavan Srinivasan <maddy@...ux.ibm.com>,
linuxppc-dev@...ts.ozlabs.org, Ben Segall <bsegall@...gle.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Mel Gorman <mgorman@...e.de>, Nicholas Piggin <npiggin@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
Valentin Schneider <vschneid@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Srikar Dronamraju <srikar@...ux.ibm.com>
Subject: [PATCH 1/2] sched: Feature to decide if steal should update CPU capacity
At present, scheduler scales CPU capacity for fair tasks based on time
spent on irq and steal time. If a CPU sees irq or steal time, its
capacity for fair tasks decreases causing tasks to migrate to other CPU
that are not affected by irq and steal time. All of this is gated by
NONTASK_CAPACITY.
In virtualized setups, a CPU that reports steal time (time taken by the
hypervisor) can cause tasks to migrate unnecessarily to sibling CPUs that
appear to be less busy, only for the situation to reverse shortly.
To mitigate this ping-pong behaviour, this change introduces a new
scheduler feature flag: ACCT_STEAL which will control whether steal time
contributes to non-task capacity adjustments (used for fair scheduling).
Signed-off-by: Srikar Dronamraju <srikar@...ux.ibm.com>
---
include/linux/sched.h | 1 +
kernel/sched/core.c | 7 +++++--
kernel/sched/debug.c | 8 ++++++++
kernel/sched/features.h | 1 +
4 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index aa9c5be7a632..451931cce5bf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2272,5 +2272,6 @@ static __always_inline void alloc_tag_restore(struct alloc_tag *tag, struct allo
#define alloc_tag_save(_tag) NULL
#define alloc_tag_restore(_tag, _old) do {} while (0)
#endif
+extern void steal_updates_cpu_capacity(bool enable);
#endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 81c6df746df1..3a7c4e307371 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -792,8 +792,11 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
rq->clock_task += delta;
#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
- if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY))
- update_irq_load_avg(rq, irq_delta + steal);
+ if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY)) {
+ if (steal && sched_feat(ACCT_STEAL))
+ irq_delta += steal;
+ update_irq_load_avg(rq, irq_delta);
+ }
#endif
update_rq_clock_pelt(rq, delta);
}
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 557246880a7e..a0393dd43bb2 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1307,3 +1307,11 @@ void resched_latency_warn(int cpu, u64 latency)
cpu, latency, cpu_rq(cpu)->ticks_without_resched);
dump_stack();
}
+
+void steal_updates_cpu_capacity(bool enable)
+{
+ if (enable)
+ sched_feat_set("ACCT_STEAL");
+ else
+ sched_feat_set("NO_ACCT_STEAL");
+}
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 3c12d9f93331..82d7806ea515 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -121,3 +121,4 @@ SCHED_FEAT(WA_BIAS, true)
SCHED_FEAT(UTIL_EST, true)
SCHED_FEAT(LATENCY_WARN, false)
+SCHED_FEAT(ACCT_STEAL, true)
--
2.47.3
Powered by blists - more mailing lists