linux-kernel - Re: [PATCH] sched/fair: Fix pelt lost idle time detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtDiMvC2Ds0aY4KtR0Zqvj8Ry7OovNbFqvaWSGmjrVxCoA@mail.gmail.com>
Date: Wed, 7 Jan 2026 08:54:43 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Qais Yousef <qyousef@...alina.io>
Cc: Samuel Wu <wusamuel@...gle.com>, mingo@...hat.com, peterz@...radead.org, 
	juri.lelli@...hat.com, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, 
	linux-kernel@...r.kernel.org, Android Kernel Team <kernel-team@...roid.com>
Subject: Re: [PATCH] sched/fair: Fix pelt lost idle time detection

On Tue, 23 Dec 2025 at 19:49, Qais Yousef <qyousef@...alina.io> wrote:
>
> On 12/23/25 17:27, Qais Yousef wrote:
> > On 12/13/25 04:54, Vincent Guittot wrote:
> >
> > > > For completeness, here are some Perfetto traces that show threads
> > > > running, CPU frequency, and PELT related stats. I've pinned the
> > > > util_avg track for a CPU on the little cluster, as the util_avg metric
> > > > shows an obvious increase (~66 vs ~3 for with patch and without patch
> > > > respectively).
> > >
> > > I was focusing on the update of rq->lost_idle_time but It can't be
> > > related because the CPUs are often idle in your trace. But it also
> > > updates the rq->clock_idle and rq->clock_pelt_idle which are used to
> > > sync cfs task util_avg at wakeup when it is about to migrate and prev
> > > cpu is idle.
> > >
> > > before the patch we could have old clock_pelt_idle and clock_idle that
> > > were used to decay the util_avg of cfs task before migrating them
> > > which would ends up with decaying too much util_avg
> > >
> > > But I noticed that you put the util_avg_rt which doesn't use the 2
> > > fields above in mainline. Does android kernel make some changes for rt
> > > util_avg tracking ?
> >
> > We shouldn't be doing that. I think we were not updating RT pressure correctly
> > before the patch. The new values make more sense to me as RT tasks are running
> > 2ms every 10ms and a util_avg_rt of ~150 range makes more sense than the
> > previous 5-6 values? If we add the 20% headroom that can easily saturate the
> > little core.
> >
> > update_rt_rq_load_avg() uses rq_clock_pelt() which takes into account the
> > lost_idle_time which we now ensure is updated in this corner case?
> >
> > I guess the first question is which do you think is the right behavior for the
> > RT pressure?
> >
> > And 2nd question, does it make sense to take RT pressure into account in
> > schedutil if there are no fair tasks? It is supposed to help compensate for the
> > stolen time by RT so we make fair run faster. But if there are no fair tasks,
> > the RT pressure is meaningless on its own as they should run at max or whatever
> > value specified by uclamp_min? I think in this test uclamp_min is set to 0 by
> > default for RT, so expected not to cause frequency to rise on their own.
>
> Something like this
>
> --->8---
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index da46c3164537..80b526c40dab 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8059,7 +8059,7 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
>                                  unsigned long *min,
>                                  unsigned long *max)
>  {
> -       unsigned long util, irq, scale;
> +       unsigned long util = 0, irq, scale;
>         struct rq *rq = cpu_rq(cpu);
>
>         scale = arch_scale_cpu_capacity(cpu);
> @@ -8100,9 +8100,14 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
>          * CFS tasks and we use the same metric to track the effective
>          * utilization (PELT windows are synchronized) we can directly add them
>          * to obtain the CPU's actual utilization.
> +        *
> +        * Only applicable if there are fair tasks queued. When a new fair task
> +        * wakes up it should trigger a freq update.
>          */

As mentioned in my other reply, it might be too late to handle
everything and we don't speed up enough rt and dl tasks to ensure
margin to cfs

> -       util = util_cfs + cpu_util_rt(rq);
> -       util += cpu_util_dl(rq);
> +       if (rq->cfs.h_nr_queued) {
> +               util = util_cfs + cpu_util_rt(rq);
> +               util += cpu_util_dl(rq);
> +       }
>
>         /*
>          * The maximum hint is a soft bandwidth requirement, which can be lower