[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251223184928.3uqacp5xgzb3jemp@airbuntu>
Date: Tue, 23 Dec 2025 18:49:28 +0000
From: Qais Yousef <qyousef@...alina.io>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Samuel Wu <wusamuel@...gle.com>, mingo@...hat.com, peterz@...radead.org,
juri.lelli@...hat.com, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
vschneid@...hat.com, linux-kernel@...r.kernel.org,
Android Kernel Team <kernel-team@...roid.com>
Subject: Re: [PATCH] sched/fair: Fix pelt lost idle time detection
On 12/23/25 17:27, Qais Yousef wrote:
> On 12/13/25 04:54, Vincent Guittot wrote:
>
> > > For completeness, here are some Perfetto traces that show threads
> > > running, CPU frequency, and PELT related stats. I've pinned the
> > > util_avg track for a CPU on the little cluster, as the util_avg metric
> > > shows an obvious increase (~66 vs ~3 for with patch and without patch
> > > respectively).
> >
> > I was focusing on the update of rq->lost_idle_time but It can't be
> > related because the CPUs are often idle in your trace. But it also
> > updates the rq->clock_idle and rq->clock_pelt_idle which are used to
> > sync cfs task util_avg at wakeup when it is about to migrate and prev
> > cpu is idle.
> >
> > before the patch we could have old clock_pelt_idle and clock_idle that
> > were used to decay the util_avg of cfs task before migrating them
> > which would ends up with decaying too much util_avg
> >
> > But I noticed that you put the util_avg_rt which doesn't use the 2
> > fields above in mainline. Does android kernel make some changes for rt
> > util_avg tracking ?
>
> We shouldn't be doing that. I think we were not updating RT pressure correctly
> before the patch. The new values make more sense to me as RT tasks are running
> 2ms every 10ms and a util_avg_rt of ~150 range makes more sense than the
> previous 5-6 values? If we add the 20% headroom that can easily saturate the
> little core.
>
> update_rt_rq_load_avg() uses rq_clock_pelt() which takes into account the
> lost_idle_time which we now ensure is updated in this corner case?
>
> I guess the first question is which do you think is the right behavior for the
> RT pressure?
>
> And 2nd question, does it make sense to take RT pressure into account in
> schedutil if there are no fair tasks? It is supposed to help compensate for the
> stolen time by RT so we make fair run faster. But if there are no fair tasks,
> the RT pressure is meaningless on its own as they should run at max or whatever
> value specified by uclamp_min? I think in this test uclamp_min is set to 0 by
> default for RT, so expected not to cause frequency to rise on their own.
Something like this
--->8---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da46c3164537..80b526c40dab 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8059,7 +8059,7 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
unsigned long *min,
unsigned long *max)
{
- unsigned long util, irq, scale;
+ unsigned long util = 0, irq, scale;
struct rq *rq = cpu_rq(cpu);
scale = arch_scale_cpu_capacity(cpu);
@@ -8100,9 +8100,14 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
* CFS tasks and we use the same metric to track the effective
* utilization (PELT windows are synchronized) we can directly add them
* to obtain the CPU's actual utilization.
+ *
+ * Only applicable if there are fair tasks queued. When a new fair task
+ * wakes up it should trigger a freq update.
*/
- util = util_cfs + cpu_util_rt(rq);
- util += cpu_util_dl(rq);
+ if (rq->cfs.h_nr_queued) {
+ util = util_cfs + cpu_util_rt(rq);
+ util += cpu_util_dl(rq);
+ }
/*
* The maximum hint is a soft bandwidth requirement, which can be lower
Powered by blists - more mailing lists