[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8cf19bf0e0054dcfed70e9935029201694f1bb5a.camel@mediatek.com>
Date: Fri, 16 Jan 2026 06:51:03 +0000
From: Alex Hoh (賀振坤) <Alex.Hoh@...iatek.com>
To: "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
"wusamuel@...gle.com" <wusamuel@...gle.com>
CC: "bsegall@...gle.com" <bsegall@...gle.com>, "vschneid@...hat.com"
<vschneid@...hat.com>, "dietmar.eggemann@....com" <dietmar.eggemann@....com>,
"peterz@...radead.org" <peterz@...radead.org>, "rostedt@...dmis.org"
<rostedt@...dmis.org>, "mingo@...hat.com" <mingo@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"mgorman@...e.de" <mgorman@...e.de>, "juri.lelli@...hat.com"
<juri.lelli@...hat.com>, "kernel-team@...roid.com" <kernel-team@...roid.com>
Subject: Re: [PATCH] sched/fair: Fix pelt lost idle time detection
On Sat, 2025-12-13 at 04:54 +0100, Vincent Guittot wrote:
> Hi Samuel,
>
>
> On Sat, 6 Dec 2025 at 02:20, Samuel Wu <wusamuel@...gle.com> wrote:
> >
> > On Fri, Dec 5, 2025 at 4:54 PM Samuel Wu <wusamuel@...gle.com>
> > wrote:
> > >
> > > On Fri, Dec 5, 2025 at 7:08 AM Vincent Guittot
> > > <vincent.guittot@...aro.org> wrote:
> > > >
> > > > On Tue, 2 Dec 2025 at 01:24, Samuel Wu <wusamuel@...gle.com>
> > > > wrote:
> > > > >
> > > > > On Wed, Oct 8, 2025 at 6:12 AM Vincent Guittot
> > > > > <vincent.guittot@...aro.org> wrote:
>
> [...]
>
> > > > > >
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I am seeing a power regression I've observed with this patch.
> > > > > This
> > > >
> > > > The problem is that this patch is about fixing a wrong load
> > > > tracking
> > > > which can be underestimated on systems that become loaded.
> > > >
> > >
> > > I feel the patch is doing the proper thing, which is the
> > > appropriate
> > > bookkeeping when idle is the next task. I just wasn't 100% sure
> > > if
> > > there were some other indirect impact that was unintentional, so
> > > thought it would be good to send a report out and have another
> > > set of
> > > eyes look over it.
> > >
> > > > > test was performed on Pixel 6 running android-mainline
> > > > > (6.18.0-rc7
> > > > > based); all scheduling vendor hooks are disabled, and I'm not
> > > > > seeing
> > > > > any obvious sched code differences compared to the vanilla
> > > > > upstream
> > > > > kernel. I am still actively working to see if I can find a
> > > > > simpler
> > > > > sequence to reproduce this on mainline Linux.
> > > > >
> > > > > The Wattson tool is reporting an increased average power
> > > > > (~30-40%)
> > > > > with the patch vs baseline (patch reverted). This regression
> > > >
> > > > For a use case in particular ?
> > >
> > > This was for BouncyBall apk, which is a bouncing ball animation.
> > > I'm
> > > still trying to find a method to reproduce this on Linux, but
> > > still
> > > haven't been able to. Also we've been checking internally to root
> > > cause this, but nothing definitive yet.
> > >
> > > >
> > > > > correlates with two other metrics:
> > > > > 1. Increased residency at higher CPU frequencies
> > > > > 2. A significant increase in sugov invocations (at least 10x)
> > > > >
> > > > > Data in the tables below are collected from a 10s run of a
> > > > > bouncing
> > > > > ball animation, with and without the patch.
> > > > > +-----------------------------------+--------------+---------
> > > > > ----------+
> > > > > > | with patch |
> > > > > > without patch |
> > > > > +-----------------------------------+-------------+----------
> > > > > ----------+
> > > > > > sugov invocation rate (Hz) | 133.5
> > > > > > | 3.7 |
> > > > > +-----------------------------------+-------------+----------
> > > > > ----------+
> > > > >
> > > > > +--------------+----------------------+----------------------
> > > > > +
> > > > > > | with patch: | without patch:
> > > > > > |
> > > > > > Freq (kHz) | time spent (ms) | time spent (ms) |
> > > > > +--------------+----------------------+----------------------
> > > > > +
> > > > > > 738000 | 4869 | 9869
> > > > > > |
> > > > > > 1803000 | 2936 |
> > > > > > 68 |
> > > > > > 1598000 | 1072 |
> > > > > > 0 |
> > > > > > 1704000 | 674
> > > > > > | 0 |
> > > > > > ... | ...
> > > > > > | ... |
> > > > > +--------------+----------------------+---------------------+
> > > > >
> > > > > Thanks!
> > > > > Sam
> >
> > For completeness, here are some Perfetto traces that show threads
> > running, CPU frequency, and PELT related stats. I've pinned the
> > util_avg track for a CPU on the little cluster, as the util_avg
> > metric
> > shows an obvious increase (~66 vs ~3 for with patch and without
> > patch
> > respectively).
>
> I was focusing on the update of rq->lost_idle_time but It can't be
> related because the CPUs are often idle in your trace. But it also
> updates the rq->clock_idle and rq->clock_pelt_idle which are used to
> sync cfs task util_avg at wakeup when it is about to migrate and prev
> cpu is idle.
>
> before the patch we could have old clock_pelt_idle and clock_idle
> that
> were used to decay the util_avg of cfs task before migrating them
> which would ends up with decaying too much util_avg
>
> But I noticed that you put the util_avg_rt which doesn't use the 2
> fields above in mainline. Does android kernel make some changes for
> rt
> util_avg tracking ?
I believe this change can indeed account for the observed increase in
RT util.
When prev is the last RT task on the rq, the scheduler proceeds through
the CFS pick-next flow. With this patch, that path advances
rq_clock_pelt to the current time. However, updating rq_clock_pelt at
this stage does not seem correct, as RT util has not yet been updated.
The RT util update actually occurs later in put_prev_set_next_task(),
and it relies on the original value of rq_clock_pelt as input. Since
rq_clock_pelt has already been overwritten by the time the RT util
update takes place, the original timestamp is lost.
As a result, the intended CPU/frequency capacity scaling behavior is
disrupted, causing RT util to increase more rapidly than expected. This
appears to be an unintended consequence introduced by the patch.
>
> >
> > - with patch:
> > https://ui.perfetto.dev/#!/?s=964594d07a5a5ba51a159ba6c90bb7ab48e09326
> > - without patch:
> > https://ui.perfetto.dev/#!/?s=6ff6854c87ea187e4ca488acd2e6501b90ec9f6f
Powered by blists - more mailing lists