linux-kernel - Re: [PATCH] sched/fair: Fix pelt lost idle time detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtAgCBWrb9qYDDTJZy9SXXDBYxGGUjyDg6jh5TfFwFzP=g@mail.gmail.com>
Date: Wed, 7 Jan 2026 08:50:45 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Qais Yousef <qyousef@...alina.io>
Cc: Samuel Wu <wusamuel@...gle.com>, mingo@...hat.com, peterz@...radead.org, 
	juri.lelli@...hat.com, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, 
	linux-kernel@...r.kernel.org, Android Kernel Team <kernel-team@...roid.com>
Subject: Re: [PATCH] sched/fair: Fix pelt lost idle time detection

On Tue, 23 Dec 2025 at 18:27, Qais Yousef <qyousef@...alina.io> wrote:
>
> On 12/13/25 04:54, Vincent Guittot wrote:
>
> > > For completeness, here are some Perfetto traces that show threads
> > > running, CPU frequency, and PELT related stats. I've pinned the
> > > util_avg track for a CPU on the little cluster, as the util_avg metric
> > > shows an obvious increase (~66 vs ~3 for with patch and without patch
> > > respectively).
> >
> > I was focusing on the update of rq->lost_idle_time but It can't be
> > related because the CPUs are often idle in your trace. But it also
> > updates the rq->clock_idle and rq->clock_pelt_idle which are used to
> > sync cfs task util_avg at wakeup when it is about to migrate and prev
> > cpu is idle.
> >
> > before the patch we could have old clock_pelt_idle and clock_idle that
> > were used to decay the util_avg of cfs task before migrating them
> > which would ends up with decaying too much util_avg
> >
> > But I noticed that you put the util_avg_rt which doesn't use the 2
> > fields above in mainline. Does android kernel make some changes for rt
> > util_avg tracking ?
>
> We shouldn't be doing that. I think we were not updating RT pressure correctly
> before the patch. The new values make more sense to me as RT tasks are running
> 2ms every 10ms and a util_avg_rt of ~150 range makes more sense than the
> previous 5-6 values? If we add the 20% headroom that can easily saturate the
> little core.
>
> update_rt_rq_load_avg() uses rq_clock_pelt() which takes into account the
> lost_idle_time which we now ensure is updated in this corner case?

But according to the Samuel's trace, there is a lot of idle time and
the CPUs are far from being over utilized so we don't miss any lost
idle time which happens when util_avg reaches 1024 (util_sum >
47791490)

>
> I guess the first question is which do you think is the right behavior for the
> RT pressure?

RT pressure reflects the utilization of CPU by RT tasks. The tracking
is simpler than cfs (no per task tracking, no migration tracking, no
util est ..) but it is the utilization of CPU by RT and as a result
should be used when selecting OPP. The reality is that this RT
utilization is almost always hidden because we jump to a high OPP as
soon as a RT task is runnable and the max(rt util_avg, rt uclamp_min)
often/always returns uclamp_min

>
> And 2nd question, does it make sense to take RT pressure into account in
> schedutil if there are no fair tasks? It is supposed to help compensate for the
> stolen time by RT so we make fair run faster. But if there are no fair tasks,
> the RT pressure is meaningless on its own as they should run at max or whatever
> value specified by uclamp_min? I think in this test uclamp_min is set to 0 by
> default for RT, so expected not to cause frequency to rise on their own.

Interesting that uclamp_min is set to 0 :-). And this is another
reason to keep rt util_avg otherwise we might not have enough capacity
to run RT task

I'm worried that if we don't take into account rt util_avg the
execution of RT task will be too much delayed when cfs task will wake
up and the increase of OPP will not be enough.

Let's take the case of a RT task that runs 2ms every 10ms.
now we have a cfs task that runs 2ms every 10ms but wakes up 8ms after
the RT task

In order to run these 2 tasks: 4ms every 10ms, we need to use a
minimum compute capacity which is rt util_avg + cfs util_avg

>
> >
> > >
> > > - with patch: https://ui.perfetto.dev/#!/?s=964594d07a5a5ba51a159ba6c90bb7ab48e09326
> > > - without patch:
> > > https://ui.perfetto.dev/#!/?s=6ff6854c87ea187e4ca488acd2e6501b90ec9f6f