linux-kernel - Re: [PATCH 00/24] Complete EEVDF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGETcx_1pZCtWiBbDmUcxEw3abF5dr=XdFCkH9zXWK75g7457w@mail.gmail.com>
Date: Tue, 26 Nov 2024 15:32:40 -0800
From: Saravana Kannan <saravanak@...gle.com>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: Samuel Wu <wusamuel@...gle.com>, Peter Zijlstra <peterz@...radead.org>, 
	Luis Machado <luis.machado@....com>, David Dai <davidai@...gle.com>, mingo@...hat.com, 
	juri.lelli@...hat.com, vincent.guittot@...aro.org, dietmar.eggemann@....com, 
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, 
	linux-kernel@...r.kernel.org, wuyun.abel@...edance.com, 
	youssefesmat@...omium.org, tglx@...utronix.de, efault@....de, 
	Android Kernel Team <kernel-team@...roid.com>, Qais Yousef <qyousef@...gle.com>, 
	Vincent Palomares <paillon@...gle.com>, John Stultz <jstultz@...gle.com>
Subject: Re: [PATCH 00/24] Complete EEVDF

On Sun, Nov 10, 2024 at 8:08 PM K Prateek Nayak <kprateek.nayak@....com> wrote:
>
> Hello Sam,
>
> On 11/9/2024 4:47 AM, Samuel Wu wrote:
> > On Thu, Nov 7, 2024 at 11:08 PM Saravana Kannan <saravanak@...gle.com> wrote:
> >>
> >> On Wed, Nov 6, 2024 at 4:07 AM Luis Machado <luis.machado@....com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> On 11/6/24 11:09, Peter Zijlstra wrote:
> >>>> On Wed, Nov 06, 2024 at 11:49:00AM +0530, K Prateek Nayak wrote:
> >>>>
> >>>>> Since delayed entities are still on the runqueue, they can affect PELT
> >>>>> calculation. Vincent and Dietmar have both noted this and Peter posted
> >>>>> https://lore.kernel.org/lkml/172595576232.2215.18027704125134691219.tip-bot2@tip-bot2/
> >>>>> in response but it was pulled out since Luis reported observing -ve
> >>>>> values for h_nr_delayed on his setup. A lot has been fixed around
> >>>>> delayed dequeue since and I wonder if now would be the right time to
> >>>>> re-attempt h_nr_delayed tracking.
> >>>>
> >>>> Yeah, it's something I meant to get back to. I think the patch as posted
> >>>> was actually right and it didn't work for Luis because of some other,
> >>>> since fixed issue.
> >>>>
> >>>> But I might be misremembering things. I'll get to it eventually :/
> >>>
> >>> Sorry for the late reply, I got sidetracked on something else.
> >>>
> >>> There have been a few power regressions (based on our Pixel6-based testing) due
> >>> to the delayed-dequeue series.
> >>>
> >>> The main one drove the frequencies up due to an imbalance in the uclamp inc/dec
> >>> handling. That has since been fixed by "[PATCH 10/24] sched/uclamg: Handle delayed dequeue". [1]
> >>>
> >>> The bug also made it so disabling DELAY_DEQUEUE at runtime didn't fix things, because the
> >>> imbalance/stale state would be perpetuated. Disabling DELAY_DEQUEUE before boot did fix things.
> >>>
> >>> So power use was brought down by the above fix, but some issues still remained, like the
> >>> accounting issues with h_nr_running and not taking sched_delayed tasks into account.
> >>>
> >>> Dietmar addressed some of it with "kernel/sched: Fix util_est accounting for DELAY_DEQUEUE". [2]
> >>>
> >>> Peter sent another patch to add accounting for sched_delayed tasks [3]. Though the patch was
> >>> mostly correct, under some circumstances [4] we spotted imbalances in the sched_delayed
> >>> accounting that slowly drove frequencies up again.
> >>>
> >>> If I recall correctly, Peter has pulled that particular patch from the tree, but we should
> >>> definitely revisit it with a proper fix for the imbalance. Suggestion in [5].
> >>>
> >>> [1] https://lore.kernel.org/lkml/20240727105029.315205425@infradead.org/
> >>> [2] https://lore.kernel.org/lkml/c49ef5fe-a909-43f1-b02f-a765ab9cedbf@arm.com/
> >>> [3] https://lore.kernel.org/lkml/172595576232.2215.18027704125134691219.tip-bot2@tip-bot2/
> >>> [4] https://lore.kernel.org/lkml/6df12fde-1e0d-445f-8f8a-736d11f9ee41@arm.com/
> >>> [5] https://lore.kernel.org/lkml/6df12fde-1e0d-445f-8f8a-736d11f9ee41@arm.com/
> >>
> >> Thanks for the replies. We are trying to disable DELAY_DEQUEUE and
> >> recollect the data to see if that's the cause. We'll get back to this
> >> thread once we have some data.
> >>
> >> -Saravana
> >
> > The test data is back to pre-EEVDF state with DELAY_DEQUEUE disabled.
> >
> > Same test example from before, when thread is affined to the big cluster:
> > +----------------------------------+
> > | Data            | Enabled | Disabled |
> > |-----------------------+----------|
> > | 5th percentile  | 96     | 143    |
> > |-----------------------+----------|
> > | Median          | 144    | 147   |
> > |-----------------------+----------|
> > | Mean            | 134    | 147   |
> > |-----------------------+----------|
> > | 95th percentile | 150    | 150   |
> > +----------------------------------+
> >
> > What are the next steps to bring this behavior back? Will DELAY_DEQUEUE always
> > be enabled by default and/or is there a fix coming for 6.12?
>
> DELAY_DEQUEUE should be enabled by default from v6.12 but there are a
> few fixes for the same in-flight. Could try running with the changes
> from [1] and [2] and see if you could reproduce the behavior and if
> you can, is it equally bad?
>
> Both changes apply cleanly for me on top of current
>
>      git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core
>
> at commit fe9beaaa802d ("sched: No PREEMPT_RT=y for all{yes,mod}config")
> when applied in that order.
>
> [1] https://lore.kernel.org/lkml/172595576232.2215.18027704125134691219.tip-bot2@tip-bot2/
> [2] https://lore.kernel.org/lkml/750542452c4f852831e601e1b8de40df4b108d9a.camel@gmx.de/

Have these changes landed in 6.12? Or will these in 6.13?

We tested 6.12 and the issue we reported is still present. What should
we do for any products we want to ship on 6.12? Disable Delayed
Dequeue or backport any fixes to 6.12 LTS?

Peter/Vincent, do you plan on backporting the future fixes to the 6.12
LTS kernel? Anything else we can do to help with making sure this is
fixed on the LTS kernel?

-Saravana