linux-kernel - Re: [PATCH] sched/fair: Fix pelt lost idle time detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260112151126.jsy3ofjmczfnxfgo@airbuntu>
Date: Mon, 12 Jan 2026 15:11:26 +0000
From: Qais Yousef <qyousef@...alina.io>
To: Samuel Wu <wusamuel@...gle.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>, mingo@...hat.com,
	peterz@...radead.org, juri.lelli@...hat.com,
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org,
	Android Kernel Team <kernel-team@...roid.com>
Subject: Re: [PATCH] sched/fair: Fix pelt lost idle time detection

On 01/05/26 12:08, Samuel Wu wrote:
> On Tue, Dec 23, 2025 at 10:49 AM Qais Yousef <qyousef@...alina.io> wrote:
> >
> > On 12/23/25 17:27, Qais Yousef wrote:
> > > On 12/13/25 04:54, Vincent Guittot wrote:
> > >
> > > > > For completeness, here are some Perfetto traces that show threads
> > > > > running, CPU frequency, and PELT related stats. I've pinned the
> > > > > util_avg track for a CPU on the little cluster, as the util_avg metric
> > > > > shows an obvious increase (~66 vs ~3 for with patch and without patch
> > > > > respectively).
> > > >
> > > > I was focusing on the update of rq->lost_idle_time but It can't be
> > > > related because the CPUs are often idle in your trace. But it also
> > > > updates the rq->clock_idle and rq->clock_pelt_idle which are used to
> > > > sync cfs task util_avg at wakeup when it is about to migrate and prev
> > > > cpu is idle.
> > > >
> > > > before the patch we could have old clock_pelt_idle and clock_idle that
> > > > were used to decay the util_avg of cfs task before migrating them
> > > > which would ends up with decaying too much util_avg
> > > >
> > > > But I noticed that you put the util_avg_rt which doesn't use the 2
> > > > fields above in mainline. Does android kernel make some changes for rt
> > > > util_avg tracking ?
> > >
> > > We shouldn't be doing that. I think we were not updating RT pressure correctly
> > > before the patch. The new values make more sense to me as RT tasks are running
> > > 2ms every 10ms and a util_avg_rt of ~150 range makes more sense than the
> > > previous 5-6 values? If we add the 20% headroom that can easily saturate the
> > > little core.
> > >
> > > update_rt_rq_load_avg() uses rq_clock_pelt() which takes into account the
> > > lost_idle_time which we now ensure is updated in this corner case?
> > >
> > > I guess the first question is which do you think is the right behavior for the
> > > RT pressure?
> > >
> > > And 2nd question, does it make sense to take RT pressure into account in
> > > schedutil if there are no fair tasks? It is supposed to help compensate for the
> > > stolen time by RT so we make fair run faster. But if there are no fair tasks,
> > > the RT pressure is meaningless on its own as they should run at max or whatever
> > > value specified by uclamp_min? I think in this test uclamp_min is set to 0 by
> > > default for RT, so expected not to cause frequency to rise on their own.
> >
> > Something like this
> >
> > --->8---
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index da46c3164537..80b526c40dab 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8059,7 +8059,7 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> >                                  unsigned long *min,
> >                                  unsigned long *max)
> >  {
> > -       unsigned long util, irq, scale;
> > +       unsigned long util = 0, irq, scale;
> >         struct rq *rq = cpu_rq(cpu);
> >
> >         scale = arch_scale_cpu_capacity(cpu);
> > @@ -8100,9 +8100,14 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> >          * CFS tasks and we use the same metric to track the effective
> >          * utilization (PELT windows are synchronized) we can directly add them
> >          * to obtain the CPU's actual utilization.
> > +        *
> > +        * Only applicable if there are fair tasks queued. When a new fair task
> > +        * wakes up it should trigger a freq update.
> >          */
> > -       util = util_cfs + cpu_util_rt(rq);
> > -       util += cpu_util_dl(rq);
> > +       if (rq->cfs.h_nr_queued) {
> > +               util = util_cfs + cpu_util_rt(rq);
> > +               util += cpu_util_dl(rq);
> > +       }
> >
> >         /*
> >          * The maximum hint is a soft bandwidth requirement, which can be lower
> 
> I tested Qais's patch with the same use case. There are 3 builds to
> reference now:
> 1. baseline
> 2. baseline + Vincent's patch
> 3. baseline + Vincent's patch + Qais's patch
> 
> Scheduling behavior seems more proper now. I agree with Qais that RT
> values seemed a little off in build 1, and also CPU freq values in
> build 2 seemed too high for the given workload. With build 3, the
> Wattson power values are back to baseline build 1, while keeping RT
> util_avg in build 3 similar to that of build 2.
> 
> - build 1: https://ui.perfetto.dev/#!/?s=6ff6854c87ea187e4ca488acd2e6501b90ec9f6f
> - build 2: https://ui.perfetto.dev/#!/?s=964594d07a5a5ba51a159ba6c90bb7ab48e09326
> - build 3: https://ui.perfetto.dev/#!/?s=c0a1585c31e51ab3e6b38d948829a4c0196f338c

Thanks for verifying! I think we can handle it better with uclamp_max; could
you give that other approach a try if possible please? Thanks!