[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8ec843b6-ac7d-4cef-a0b1-12b85470fde8@linux.ibm.com>
Date: Tue, 28 Oct 2025 20:35:05 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Srikar Dronamraju <srikar@...ux.ibm.com>, linux-kernel@...r.kernel.org
Cc: Michael Ellerman <mpe@...erman.id.au>,
Madhavan Srinivasan <maddy@...ux.ibm.com>,
linuxppc-dev@...ts.ozlabs.org, Ben Segall <bsegall@...gle.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Mel Gorman <mgorman@...e.de>, Nicholas Piggin <npiggin@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
Valentin Schneider <vschneid@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH 1/2] sched: Feature to decide if steal should update CPU
capacity
On 10/28/25 4:12 PM, Srikar Dronamraju wrote:
> At present, scheduler scales CPU capacity for fair tasks based on time
> spent on irq and steal time. If a CPU sees irq or steal time, its
> capacity for fair tasks decreases causing tasks to migrate to other CPU
> that are not affected by irq and steal time. All of this is gated by
> NONTASK_CAPACITY.
>
> In virtualized setups, a CPU that reports steal time (time taken by the
> hypervisor) can cause tasks to migrate unnecessarily to sibling CPUs that
> appear to be less busy, only for the situation to reverse shortly.
>
> To mitigate this ping-pong behaviour, this change introduces a new
> scheduler feature flag: ACCT_STEAL which will control whether steal time
> contributes to non-task capacity adjustments (used for fair scheduling).
>
> Signed-off-by: Srikar Dronamraju <srikar@...ux.ibm.com>
> ---
> include/linux/sched.h | 1 +
> kernel/sched/core.c | 7 +++++--
> kernel/sched/debug.c | 8 ++++++++
> kernel/sched/features.h | 1 +
> 4 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index aa9c5be7a632..451931cce5bf 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2272,5 +2272,6 @@ static __always_inline void alloc_tag_restore(struct alloc_tag *tag, struct allo
> #define alloc_tag_save(_tag) NULL
> #define alloc_tag_restore(_tag, _old) do {} while (0)
> #endif
> +extern void steal_updates_cpu_capacity(bool enable);
>
> #endif
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 81c6df746df1..3a7c4e307371 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -792,8 +792,11 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
> rq->clock_task += delta;
>
> #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
Curious to know if there are users/distro which have CONFIG_HAVE_SCHED_AVG_IRQ=n
> - if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY))
> - update_irq_load_avg(rq, irq_delta + steal);
> + if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY)) {
> + if (steal && sched_feat(ACCT_STEAL))
> + irq_delta += steal;
> + update_irq_load_avg(rq, irq_delta);
> + }
> #endif
> update_rq_clock_pelt(rq, delta);
> }
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 557246880a7e..a0393dd43bb2 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -1307,3 +1307,11 @@ void resched_latency_warn(int cpu, u64 latency)
> cpu, latency, cpu_rq(cpu)->ticks_without_resched);
> dump_stack();
> }
> +
> +void steal_updates_cpu_capacity(bool enable)
> +{
> + if (enable)
> + sched_feat_set("ACCT_STEAL");
> + else
> + sched_feat_set("NO_ACCT_STEAL");
> +}
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 3c12d9f93331..82d7806ea515 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -121,3 +121,4 @@ SCHED_FEAT(WA_BIAS, true)
> SCHED_FEAT(UTIL_EST, true)
>
> SCHED_FEAT(LATENCY_WARN, false)
> +SCHED_FEAT(ACCT_STEAL, true)
Powered by blists - more mailing lists