linux-kernel - Re: Very high scheduling delay with plenty of idle CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGETcx-pFmBSkVfQ2tAitunb+1uZ_wE6b1+H-4jdAM_0SxJjtQ@mail.gmail.com>
Date: Tue, 12 Nov 2024 08:25:59 -0800
From: Saravana Kannan <saravanak@...gle.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Peter Zijlstra <peterz@...radead.org>, K Prateek Nayak <kprateek.nayak@....com>, 
	Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Benjamin Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, LKML <linux-kernel@...r.kernel.org>, 
	wuyun.abel@...edance.com, youssefesmat@...omium.org, 
	Thomas Gleixner <tglx@...utronix.de>, efault@....de, John Stultz <jstultz@...gle.com>, 
	Vincent Palomares <paillon@...gle.com>, Tobias Huschle <huschle@...ux.ibm.com>
Subject: Re: Very high scheduling delay with plenty of idle CPUs

On Tue, Nov 12, 2024 at 1:03 AM Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> On Tue, 12 Nov 2024 at 08:24, Saravana Kannan <saravanak@...gle.com> wrote:
> >
> > On Mon, Nov 11, 2024 at 11:12 AM Vincent Guittot
> > <vincent.guittot@...aro.org> wrote:
> > >
> > > On Mon, 11 Nov 2024 at 20:01, Vincent Guittot
> > > <vincent.guittot@...aro.org> wrote:
> > > >
> > > > On Mon, 11 Nov 2024 at 19:24, Saravana Kannan <saravanak@...gle.com> wrote:
> > > > >
> > > > > On Mon, Nov 11, 2024 at 2:41 AM Peter Zijlstra <peterz@...radead.org> wrote:
> > > > > >
> > > > > > On Sun, Nov 10, 2024 at 10:15:07PM -0800, Saravana Kannan wrote:
> > > > > >
> > > > > > > I actually quickly hacked up the cpu_overutilized() function to return
> > > > > > > true during suspend/resume and the threads are nicely spread out and
> > > > > > > running in parallel. That actually reduces the total of the
> > > > > > > dpm_resume*() phases from 90ms to 75ms on my Pixel 6.
> > > > > >
> > > > > > Right, so that kills EAS and makes it fall through to the regular
> > > > > > select_idle_sibling() thing.
> > > > > >
> > > > > > > Peter,
> > > > > > >
> > > > > > > Would you be open to the scheduler being aware of
> > > > > > > dpm_suspend*()/dpm_resume*() phases and triggering the CPU
> > > > > > > overutilized behavior during these phases? I know it's a very use case
> > > > > > > specific behavior but how often do we NOT want to speed up
> > > > > > > suspend/resume? We can make this a CONFIG or a kernel command line
> > > > > > > option -- say, fast_suspend or something like that.
> > > > > >
> > > > > > Well, I don't mind if Vincent doesn't. It seems like a very
> > > > > > specific/targeted thing and should not affect much else, so it is a
> > > > > > relatively safe thing to do.
> > > > > >
> > > > > > Perhaps a more direct hack in is_rd_overutilized() would be even less
> > > > > > invasive, changing cpu_overutilized() relies on that getting propagated
> > > > > > to rd->overutilized, might as well skip that step, no?
> > > > >
> > > > > is_rd_overutilized() sounds good to me. Outside of setting a flag in
> > > >
> > > > At know I'm not convinced that this is a solution but just a quick
> > > > hack for your problem. We must understand 1st what is wrong
> > >
> > > And you should better switch to performance cpufreq governor to
> > > disable eas and run at max freq if your further wants to decrease
> > > latency
> >
> > Ohhh... now that you mention fixing CPU frequencies, a lot of systems
> > fix their CPU frequencies during suspend/resume. Pixel 6 is one of
> > them. In the case of Pixel 6, the driver sets the policy min/max to
> > these fixed frequencies to force the CPU to stay at one frequency.
> > Will EAS handle this correctly? I wonder if that'd affect the task
>
> AFAICT, it should

To be clear, I'm not opposed to any sched fixes that will do the right
thing naturally.

> > placement decision. Also, other systems might limit CPU frequencies in
> > ways that EAS can't tell. If the CPU frequencies are frozen, I'm not
> > sure EAS makes a lot of sense. Except maybe using CPU max capacity to
> > make sure little CPUs are busy first before using the big CPUs?
> >
> > But even if EAS thinks the CPU freq could go up (when it can't), it
> > still doesn't make a lot of sense to not use those idle CPUs and
> > instead try to bump up the frequency (by putting more threads in a
> > CPU).
>
> In this case, you just need to call the below before entering suspend
> and after resuming
>   echo 1 > /proc/sys/kernel/sched_energy_aware
> instead of hacking overutilized
> This will disable EAS without rebuilding sched domain

That disables EAS for a huge portion of the suspend/resume where we do
want it to be enabled.

Also, as I said before, I want to do this only for the "devices
resume" part where there is a lot of parallelism. Not for the entire
system suspend/resume.

Is there an in-kernel version of this call? Do I just need to set and
clear sysctl_sched_energy_aware? Also, does setting/clearing
overutilized rebuild the sched domain?

Thanks,
Saravana

>
> >
> > Anyway, with all this in mind, it makes more sense to me to just
> > trigger the "overutilized" mode during these specific parts of
> > suspend/resume.
> >
> > -Saravana
> >
> > >
> > > >
> > > > > sched.c that the suspend/resume code sets/clears, I can't think of an
> > > > > interface that's better at avoiding abuse. Let me know if you have
> > > > > any. Otherwise, I'll just go with the flag option. If Vincent gets the
> > > > > scheduler to do the right thing without this, I'll happily drop this
> > > > > targeted hack.
> > > > >
> > > > > -Saravana