[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKfTPtCUVcSZbFDLswaT25xjKuqx9fD57Kz_Di8ZDMsEhmnjWw@mail.gmail.com>
Date: Sat, 13 Dec 2025 04:54:32 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Samuel Wu <wusamuel@...gle.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org,
Android Kernel Team <kernel-team@...roid.com>
Subject: Re: [PATCH] sched/fair: Fix pelt lost idle time detection
Hi Samuel,
On Sat, 6 Dec 2025 at 02:20, Samuel Wu <wusamuel@...gle.com> wrote:
>
> On Fri, Dec 5, 2025 at 4:54 PM Samuel Wu <wusamuel@...gle.com> wrote:
> >
> > On Fri, Dec 5, 2025 at 7:08 AM Vincent Guittot
> > <vincent.guittot@...aro.org> wrote:
> > >
> > > On Tue, 2 Dec 2025 at 01:24, Samuel Wu <wusamuel@...gle.com> wrote:
> > > >
> > > > On Wed, Oct 8, 2025 at 6:12 AM Vincent Guittot
> > > > <vincent.guittot@...aro.org> wrote:
[...]
> > > > >
> > > >
> > > > Hi all,
> > > >
> > > > I am seeing a power regression I've observed with this patch. This
> > >
> > > The problem is that this patch is about fixing a wrong load tracking
> > > which can be underestimated on systems that become loaded.
> > >
> >
> > I feel the patch is doing the proper thing, which is the appropriate
> > bookkeeping when idle is the next task. I just wasn't 100% sure if
> > there were some other indirect impact that was unintentional, so
> > thought it would be good to send a report out and have another set of
> > eyes look over it.
> >
> > > > test was performed on Pixel 6 running android-mainline (6.18.0-rc7
> > > > based); all scheduling vendor hooks are disabled, and I'm not seeing
> > > > any obvious sched code differences compared to the vanilla upstream
> > > > kernel. I am still actively working to see if I can find a simpler
> > > > sequence to reproduce this on mainline Linux.
> > > >
> > > > The Wattson tool is reporting an increased average power (~30-40%)
> > > > with the patch vs baseline (patch reverted). This regression
> > >
> > > For a use case in particular ?
> >
> > This was for BouncyBall apk, which is a bouncing ball animation. I'm
> > still trying to find a method to reproduce this on Linux, but still
> > haven't been able to. Also we've been checking internally to root
> > cause this, but nothing definitive yet.
> >
> > >
> > > > correlates with two other metrics:
> > > > 1. Increased residency at higher CPU frequencies
> > > > 2. A significant increase in sugov invocations (at least 10x)
> > > >
> > > > Data in the tables below are collected from a 10s run of a bouncing
> > > > ball animation, with and without the patch.
> > > > +-----------------------------------+--------------+-------------------+
> > > > | | with patch | without patch |
> > > > +-----------------------------------+-------------+--------------------+
> > > > | sugov invocation rate (Hz) | 133.5 | 3.7 |
> > > > +-----------------------------------+-------------+--------------------+
> > > >
> > > > +--------------+----------------------+----------------------+
> > > > | | with patch: | without patch: |
> > > > | Freq (kHz) | time spent (ms) | time spent (ms) |
> > > > +--------------+----------------------+----------------------+
> > > > | 738000 | 4869 | 9869 |
> > > > | 1803000 | 2936 | 68 |
> > > > | 1598000 | 1072 | 0 |
> > > > | 1704000 | 674 | 0 |
> > > > | ... | ... | ... |
> > > > +--------------+----------------------+---------------------+
> > > >
> > > > Thanks!
> > > > Sam
>
> For completeness, here are some Perfetto traces that show threads
> running, CPU frequency, and PELT related stats. I've pinned the
> util_avg track for a CPU on the little cluster, as the util_avg metric
> shows an obvious increase (~66 vs ~3 for with patch and without patch
> respectively).
I was focusing on the update of rq->lost_idle_time but It can't be
related because the CPUs are often idle in your trace. But it also
updates the rq->clock_idle and rq->clock_pelt_idle which are used to
sync cfs task util_avg at wakeup when it is about to migrate and prev
cpu is idle.
before the patch we could have old clock_pelt_idle and clock_idle that
were used to decay the util_avg of cfs task before migrating them
which would ends up with decaying too much util_avg
But I noticed that you put the util_avg_rt which doesn't use the 2
fields above in mainline. Does android kernel make some changes for rt
util_avg tracking ?
>
> - with patch: https://ui.perfetto.dev/#!/?s=964594d07a5a5ba51a159ba6c90bb7ab48e09326
> - without patch:
> https://ui.perfetto.dev/#!/?s=6ff6854c87ea187e4ca488acd2e6501b90ec9f6f
Powered by blists - more mailing lists