linux-kernel - Re: Very high scheduling delay with plenty of idle CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtCs8wCoUvNgxNcqi5ozDiRBrLLkuA4Edi1bu1UZLsV-Vg@mail.gmail.com>
Date: Tue, 12 Nov 2024 10:03:19 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Saravana Kannan <saravanak@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>, K Prateek Nayak <kprateek.nayak@....com>, 
	Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Benjamin Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, LKML <linux-kernel@...r.kernel.org>, 
	wuyun.abel@...edance.com, youssefesmat@...omium.org, 
	Thomas Gleixner <tglx@...utronix.de>, efault@....de, John Stultz <jstultz@...gle.com>, 
	Vincent Palomares <paillon@...gle.com>, Tobias Huschle <huschle@...ux.ibm.com>
Subject: Re: Very high scheduling delay with plenty of idle CPUs

On Tue, 12 Nov 2024 at 08:24, Saravana Kannan <saravanak@...gle.com> wrote:
>
> On Mon, Nov 11, 2024 at 11:12 AM Vincent Guittot
> <vincent.guittot@...aro.org> wrote:
> >
> > On Mon, 11 Nov 2024 at 20:01, Vincent Guittot
> > <vincent.guittot@...aro.org> wrote:
> > >
> > > On Mon, 11 Nov 2024 at 19:24, Saravana Kannan <saravanak@...gle.com> wrote:
> > > >
> > > > On Mon, Nov 11, 2024 at 2:41 AM Peter Zijlstra <peterz@...radead.org> wrote:
> > > > >
> > > > > On Sun, Nov 10, 2024 at 10:15:07PM -0800, Saravana Kannan wrote:
> > > > >
> > > > > > I actually quickly hacked up the cpu_overutilized() function to return
> > > > > > true during suspend/resume and the threads are nicely spread out and
> > > > > > running in parallel. That actually reduces the total of the
> > > > > > dpm_resume*() phases from 90ms to 75ms on my Pixel 6.
> > > > >
> > > > > Right, so that kills EAS and makes it fall through to the regular
> > > > > select_idle_sibling() thing.
> > > > >
> > > > > > Peter,
> > > > > >
> > > > > > Would you be open to the scheduler being aware of
> > > > > > dpm_suspend*()/dpm_resume*() phases and triggering the CPU
> > > > > > overutilized behavior during these phases? I know it's a very use case
> > > > > > specific behavior but how often do we NOT want to speed up
> > > > > > suspend/resume? We can make this a CONFIG or a kernel command line
> > > > > > option -- say, fast_suspend or something like that.
> > > > >
> > > > > Well, I don't mind if Vincent doesn't. It seems like a very
> > > > > specific/targeted thing and should not affect much else, so it is a
> > > > > relatively safe thing to do.
> > > > >
> > > > > Perhaps a more direct hack in is_rd_overutilized() would be even less
> > > > > invasive, changing cpu_overutilized() relies on that getting propagated
> > > > > to rd->overutilized, might as well skip that step, no?
> > > >
> > > > is_rd_overutilized() sounds good to me. Outside of setting a flag in
> > >
> > > At know I'm not convinced that this is a solution but just a quick
> > > hack for your problem. We must understand 1st what is wrong
> >
> > And you should better switch to performance cpufreq governor to
> > disable eas and run at max freq if your further wants to decrease
> > latency
>
> Ohhh... now that you mention fixing CPU frequencies, a lot of systems
> fix their CPU frequencies during suspend/resume. Pixel 6 is one of
> them. In the case of Pixel 6, the driver sets the policy min/max to
> these fixed frequencies to force the CPU to stay at one frequency.
> Will EAS handle this correctly? I wonder if that'd affect the task

AFAICT, it should

> placement decision. Also, other systems might limit CPU frequencies in
> ways that EAS can't tell. If the CPU frequencies are frozen, I'm not
> sure EAS makes a lot of sense. Except maybe using CPU max capacity to
> make sure little CPUs are busy first before using the big CPUs?
>
> But even if EAS thinks the CPU freq could go up (when it can't), it
> still doesn't make a lot of sense to not use those idle CPUs and
> instead try to bump up the frequency (by putting more threads in a
> CPU).

In this case, you just need to call the below before entering suspend
and after resuming
  echo 1 > /proc/sys/kernel/sched_energy_aware
instead of hacking overutilized
This will disable EAS without rebuilding sched domain

>
> Anyway, with all this in mind, it makes more sense to me to just
> trigger the "overutilized" mode during these specific parts of
> suspend/resume.
>
> -Saravana
>
> >
> > >
> > > > sched.c that the suspend/resume code sets/clears, I can't think of an
> > > > interface that's better at avoiding abuse. Let me know if you have
> > > > any. Otherwise, I'll just go with the flag option. If Vincent gets the
> > > > scheduler to do the right thing without this, I'll happily drop this
> > > > targeted hack.
> > > >
> > > > -Saravana