linux-kernel - Re: [PATCH] sched/fair: Fix pelt lost idle time detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG2KctohebWb11cAGD-7VyMi8od=+P6amA1sZGtVHRqhq68r=Q@mail.gmail.com>
Date: Fri, 5 Dec 2025 16:54:55 -0800
From: Samuel Wu <wusamuel@...gle.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com, 
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com, 
	mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org, 
	Android Kernel Team <kernel-team@...roid.com>
Subject: Re: [PATCH] sched/fair: Fix pelt lost idle time detection

On Fri, Dec 5, 2025 at 7:08 AM Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> On Tue, 2 Dec 2025 at 01:24, Samuel Wu <wusamuel@...gle.com> wrote:
> >
> > On Wed, Oct 8, 2025 at 6:12 AM Vincent Guittot
> > <vincent.guittot@...aro.org> wrote:
> > >
> > > The check for some lost idle pelt time should be always done when
> > > pick_next_task_fair() fails to pick a task and not only when we call it
> > > from the fair fast-path.
> > >
> > > The case happens when the last running task on rq is a RT or DL task. When
> > > the latter goes to sleep and the /Sum of util_sum of the rq is at the max
> > > value, we don't account the lost of idle time whereas we should.
> > >
> > > Fixes: 67692435c411 ("sched: Rework pick_next_task() slow-path")
> > > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> > > ---
> > >
> > > I Noticed this while reviewing [1]
> > >
> > > [1] https://lore.kernel.org/all/20251006105453.648473106@infradead.org/
> > >
> > >  kernel/sched/fair.c | 26 +++++++++++++-------------
> > >  1 file changed, 13 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index b3be1e2749ce..dd0ea01af730 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8920,21 +8920,21 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
> > >         return p;
> > >
> > >  idle:
> > > -       if (!rf)
> > > -               return NULL;
> > > -
> > > -       new_tasks = sched_balance_newidle(rq, rf);
> > > +       if (rf) {
> > > +               new_tasks = sched_balance_newidle(rq, rf);
> > >
> > > -       /*
> > > -        * Because sched_balance_newidle() releases (and re-acquires) rq->lock, it is
> > > -        * possible for any higher priority task to appear. In that case we
> > > -        * must re-start the pick_next_entity() loop.
> > > -        */
> > > -       if (new_tasks < 0)
> > > -               return RETRY_TASK;
> > > +               /*
> > > +                * Because sched_balance_newidle() releases (and re-acquires)
> > > +                * rq->lock, it is possible for any higher priority task to
> > > +                * appear. In that case we must re-start the pick_next_entity()
> > > +                * loop.
> > > +                */
> > > +               if (new_tasks < 0)
> > > +                       return RETRY_TASK;
> > >
> > > -       if (new_tasks > 0)
> > > -               goto again;
> > > +               if (new_tasks > 0)
> > > +                       goto again;
> > > +       }
> > >
> > >         /*
> > >          * rq is about to be idle, check if we need to update the
> > > --
> > > 2.43.0
> > >
> >
> > Hi all,
> >
> > I am seeing a power regression I've observed with this patch. This
>
> The problem is that this patch is about fixing a wrong load tracking
> which can be underestimated on systems that become loaded.
>

I feel the patch is doing the proper thing, which is the appropriate
bookkeeping when idle is the next task. I just wasn't 100% sure if
there were some other indirect impact that was unintentional, so
thought it would be good to send a report out and have another set of
eyes look over it.

> > test was performed on Pixel 6 running android-mainline (6.18.0-rc7
> > based); all scheduling vendor hooks are disabled, and I'm not seeing
> > any obvious sched code differences compared to the vanilla upstream
> > kernel. I am still actively working to see if I can find a simpler
> > sequence to reproduce this on mainline Linux.
> >
> > The Wattson tool is reporting an increased average power (~30-40%)
> > with the patch vs baseline (patch reverted). This regression
>
> For a use case in particular ?

This was for BouncyBall apk, which is a bouncing ball animation. I'm
still trying to find a method to reproduce this on Linux, but still
haven't been able to. Also we've been checking internally to root
cause this, but nothing definitive yet.

>
> > correlates with two other metrics:
> > 1. Increased residency at higher CPU frequencies
> > 2. A significant increase in sugov invocations (at least 10x)
> >
> > Data in the tables below are collected from a 10s run of a bouncing
> > ball animation, with and without the patch.
> > +-----------------------------------+--------------+-------------------+
> > |                                           | with patch |  without patch |
> > +-----------------------------------+-------------+--------------------+
> > | sugov invocation rate (Hz) |       133.5 |                   3.7 |
> > +-----------------------------------+-------------+--------------------+
> >
> > +--------------+----------------------+----------------------+
> > |                   |         with patch: |    without patch: |
> > | Freq (kHz) | time spent (ms) |  time spent (ms) |
> > +--------------+----------------------+----------------------+
> > |     738000 |                   4869 |                  9869 |
> > |   1803000 |                   2936 |                      68 |
> > |   1598000 |                   1072 |                        0 |
> > |   1704000 |                     674 |                        0 |
> > |              ... |                        ... |                       ... |
> > +--------------+----------------------+---------------------+
> >
> > Thanks!
> > Sam