linux-kernel - Re: [PATCH] sched/fair: Fix pelt lost idle time detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251223184928.3uqacp5xgzb3jemp@airbuntu>
Date: Tue, 23 Dec 2025 18:49:28 +0000
From: Qais Yousef <qyousef@...alina.io>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Samuel Wu <wusamuel@...gle.com>, mingo@...hat.com, peterz@...radead.org,
	juri.lelli@...hat.com, dietmar.eggemann@....com,
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
	vschneid@...hat.com, linux-kernel@...r.kernel.org,
	Android Kernel Team <kernel-team@...roid.com>
Subject: Re: [PATCH] sched/fair: Fix pelt lost idle time detection

On 12/23/25 17:27, Qais Yousef wrote:
> On 12/13/25 04:54, Vincent Guittot wrote:
> 
> > > For completeness, here are some Perfetto traces that show threads
> > > running, CPU frequency, and PELT related stats. I've pinned the
> > > util_avg track for a CPU on the little cluster, as the util_avg metric
> > > shows an obvious increase (~66 vs ~3 for with patch and without patch
> > > respectively).
> > 
> > I was focusing on the update of rq->lost_idle_time but It can't be
> > related because the CPUs are often idle in your trace. But it also
> > updates the rq->clock_idle and rq->clock_pelt_idle which are used to
> > sync cfs task util_avg at wakeup when it is about to migrate and prev
> > cpu is idle.
> > 
> > before the patch we could have old clock_pelt_idle and clock_idle that
> > were used to decay the util_avg of cfs task before migrating them
> > which would ends up with decaying too much util_avg
> > 
> > But I noticed that you put the util_avg_rt which doesn't use the 2
> > fields above in mainline. Does android kernel make some changes for rt
> > util_avg tracking ?
> 
> We shouldn't be doing that. I think we were not updating RT pressure correctly
> before the patch. The new values make more sense to me as RT tasks are running
> 2ms every 10ms and a util_avg_rt of ~150 range makes more sense than the
> previous 5-6 values? If we add the 20% headroom that can easily saturate the
> little core.
> 
> update_rt_rq_load_avg() uses rq_clock_pelt() which takes into account the
> lost_idle_time which we now ensure is updated in this corner case?
> 
> I guess the first question is which do you think is the right behavior for the
> RT pressure?
> 
> And 2nd question, does it make sense to take RT pressure into account in
> schedutil if there are no fair tasks? It is supposed to help compensate for the
> stolen time by RT so we make fair run faster. But if there are no fair tasks,
> the RT pressure is meaningless on its own as they should run at max or whatever
> value specified by uclamp_min? I think in this test uclamp_min is set to 0 by
> default for RT, so expected not to cause frequency to rise on their own.

Something like this

--->8---

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da46c3164537..80b526c40dab 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8059,7 +8059,7 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
                                 unsigned long *min,
                                 unsigned long *max)
 {
-       unsigned long util, irq, scale;
+       unsigned long util = 0, irq, scale;
        struct rq *rq = cpu_rq(cpu);

        scale = arch_scale_cpu_capacity(cpu);
@@ -8100,9 +8100,14 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
         * CFS tasks and we use the same metric to track the effective
         * utilization (PELT windows are synchronized) we can directly add them
         * to obtain the CPU's actual utilization.
+        *
+        * Only applicable if there are fair tasks queued. When a new fair task
+        * wakes up it should trigger a freq update.
         */
-       util = util_cfs + cpu_util_rt(rq);
-       util += cpu_util_dl(rq);
+       if (rq->cfs.h_nr_queued) {
+               util = util_cfs + cpu_util_rt(rq);
+               util += cpu_util_dl(rq);
+       }

        /*
         * The maximum hint is a soft bandwidth requirement, which can be lower