linux-kernel - Re: [PATCH] sched/fair: Fix pelt lost idle time detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWpJb5n0Hc0vzVmI@vingu-cube>
Date: Fri, 16 Jan 2026 15:21:35 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Alex Hoh (賀振坤) <Alex.Hoh@...iatek.com>
Cc: "wusamuel@...gle.com" <wusamuel@...gle.com>,
	"bsegall@...gle.com" <bsegall@...gle.com>,
	"vschneid@...hat.com" <vschneid@...hat.com>,
	"dietmar.eggemann@....com" <dietmar.eggemann@....com>,
	"peterz@...radead.org" <peterz@...radead.org>,
	"rostedt@...dmis.org" <rostedt@...dmis.org>,
	"mingo@...hat.com" <mingo@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"mgorman@...e.de" <mgorman@...e.de>,
	"juri.lelli@...hat.com" <juri.lelli@...hat.com>,
	"kernel-team@...roid.com" <kernel-team@...roid.com>
Subject: Re: [PATCH] sched/fair: Fix pelt lost idle time detection

Hi Alex,

Le vendredi 16 janv. 2026 à 06:51:03 (+0000), Alex Hoh (賀振坤) a écrit :
> On Sat, 2025-12-13 at 04:54 +0100, Vincent Guittot wrote:
> > Hi Samuel,
> > 
> > 
> > On Sat, 6 Dec 2025 at 02:20, Samuel Wu <wusamuel@...gle.com> wrote:
> > > 
> > > On Fri, Dec 5, 2025 at 4:54 PM Samuel Wu <wusamuel@...gle.com>
> > > wrote:
> > > > 
> > > > On Fri, Dec 5, 2025 at 7:08 AM Vincent Guittot
> > > > <vincent.guittot@...aro.org> wrote:
> > > > > 
> > > > > On Tue, 2 Dec 2025 at 01:24, Samuel Wu <wusamuel@...gle.com>
> > > > > wrote:
> > > > > > 
> > > > > > On Wed, Oct 8, 2025 at 6:12 AM Vincent Guittot
> > > > > > <vincent.guittot@...aro.org> wrote:
> > 
> > [...]
> > 
> > > > > > > 
> > > > > > 
> > > > > > Hi all,
> > > > > > 
> > > > > > I am seeing a power regression I've observed with this patch.
> > > > > > This
> > > > > 
> > > > > The problem is that this patch is about fixing a wrong load
> > > > > tracking
> > > > > which can be underestimated on systems that become loaded.
> > > > > 
> > > > 
> > > > I feel the patch is doing the proper thing, which is the
> > > > appropriate
> > > > bookkeeping when idle is the next task. I just wasn't 100% sure
> > > > if
> > > > there were some other indirect impact that was unintentional, so
> > > > thought it would be good to send a report out and have another
> > > > set of
> > > > eyes look over it.
> > > > 
> > > > > > test was performed on Pixel 6 running android-mainline
> > > > > > (6.18.0-rc7
> > > > > > based); all scheduling vendor hooks are disabled, and I'm not
> > > > > > seeing
> > > > > > any obvious sched code differences compared to the vanilla
> > > > > > upstream
> > > > > > kernel. I am still actively working to see if I can find a
> > > > > > simpler
> > > > > > sequence to reproduce this on mainline Linux.
> > > > > > 
> > > > > > The Wattson tool is reporting an increased average power
> > > > > > (~30-40%)
> > > > > > with the patch vs baseline (patch reverted). This regression
> > > > > 
> > > > > For a use case in particular ?
> > > > 
> > > > This was for BouncyBall apk, which is a bouncing ball animation.
> > > > I'm
> > > > still trying to find a method to reproduce this on Linux, but
> > > > still
> > > > haven't been able to. Also we've been checking internally to root
> > > > cause this, but nothing definitive yet.
> > > > 
> > > > > 
> > > > > > correlates with two other metrics:
> > > > > > 1. Increased residency at higher CPU frequencies
> > > > > > 2. A significant increase in sugov invocations (at least 10x)
> > > > > > 
> > > > > > Data in the tables below are collected from a 10s run of a
> > > > > > bouncing
> > > > > > ball animation, with and without the patch.
> > > > > > +-----------------------------------+--------------+---------
> > > > > > ----------+
> > > > > > >                                           | with patch | 
> > > > > > > without patch |
> > > > > > +-----------------------------------+-------------+----------
> > > > > > ----------+
> > > > > > > sugov invocation rate (Hz) |       133.5
> > > > > > > |                   3.7 |
> > > > > > +-----------------------------------+-------------+----------
> > > > > > ----------+
> > > > > > 
> > > > > > +--------------+----------------------+----------------------
> > > > > > +
> > > > > > >                   |         with patch: |    without patch:
> > > > > > > |
> > > > > > > Freq (kHz) | time spent (ms) |  time spent (ms) |
> > > > > > +--------------+----------------------+----------------------
> > > > > > +
> > > > > > >     738000 |                   4869 |                  9869
> > > > > > > |
> > > > > > >   1803000 |                   2936 |                     
> > > > > > > 68 |
> > > > > > >   1598000 |                   1072 |                       
> > > > > > > 0 |
> > > > > > >   1704000 |                     674
> > > > > > > |                        0 |
> > > > > > >              ... |                        ...
> > > > > > > |                       ... |
> > > > > > +--------------+----------------------+---------------------+
> > > > > > 
> > > > > > Thanks!
> > > > > > Sam
> > > 
> > > For completeness, here are some Perfetto traces that show threads
> > > running, CPU frequency, and PELT related stats. I've pinned the
> > > util_avg track for a CPU on the little cluster, as the util_avg
> > > metric
> > > shows an obvious increase (~66 vs ~3 for with patch and without
> > > patch
> > > respectively).
> > 
> > I was focusing on the update of rq->lost_idle_time but It can't be
> > related because the CPUs are often idle in your trace. But it also
> > updates the rq->clock_idle and rq->clock_pelt_idle which are used to
> > sync cfs task util_avg at wakeup when it is about to migrate and prev
> > cpu is idle.
> > 
> > before the patch we could have old clock_pelt_idle and clock_idle
> > that
> > were used to decay the util_avg of cfs task before migrating them
> > which would ends up with decaying too much util_avg
> > 
> > But I noticed that you put the util_avg_rt which doesn't use the 2
> > fields above in mainline. Does android kernel make some changes for
> > rt
> > util_avg tracking ?
> 
> 
> I believe this change can indeed account for the observed increase in
> RT util.
> 
> When prev is the last RT task on the rq, the scheduler proceeds through
> the CFS pick-next flow. With this patch, that path advances
> rq_clock_pelt to the current time. However, updating rq_clock_pelt at
> this stage does not seem correct, as RT util has not yet been updated.

You're right, put prev happens after we updated pelt clock

Samuel,

Could you try the fix below ?

---
 kernel/sched/fair.c | 6 ------
 kernel/sched/idle.c | 3 +++
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 108213e94158..bea71564d3da 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8989,12 +8989,6 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
 			goto again;
 	}

-	/*
-	 * rq is about to be idle, check if we need to update the
-	 * lost_idle_time of clock_pelt
-	 */
-	update_idle_rq_clock_pelt(rq);
-
 	return NULL;
 }

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 65eb8f8c1a5d..91af6acafe44 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -260,6 +260,9 @@ static void do_idle(void)
 {
 	int cpu = smp_processor_id();

+	/* Sync pelt clock with rq's one */
+	update_idle_rq_clock_pelt(cpu_rq(cpu));
+
 	/*
 	 * Check if we need to update blocked load
 	 */
--
2.43.0

> 
> The RT util update actually occurs later in put_prev_set_next_task(),
> and it relies on the original value of rq_clock_pelt as input. Since
> rq_clock_pelt has already been overwritten by the time the RT util
> update takes place, the original timestamp is lost.
> 
> As a result, the intended CPU/frequency capacity scaling behavior is
> disrupted, causing RT util to increase more rapidly than expected. This
> appears to be an unintended consequence introduced by the patch.
> 
> > 
> > > 
> > > - with patch:
> > > https://ui.perfetto.dev/#!/?s=964594d07a5a5ba51a159ba6c90bb7ab48e09326
> > > - without patch:
> > > https://ui.perfetto.dev/#!/?s=6ff6854c87ea187e4ca488acd2e6501b90ec9f6f