[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtAyWEvVMDR4cT_nu9fw47rb-Rjm6X-C5UJE0ZRFzdROrQ@mail.gmail.com>
Date: Tue, 12 Nov 2024 18:00:48 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Saravana Kannan <saravanak@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>, K Prateek Nayak <kprateek.nayak@....com>,
Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>,
Benjamin Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>, LKML <linux-kernel@...r.kernel.org>,
wuyun.abel@...edance.com, youssefesmat@...omium.org,
Thomas Gleixner <tglx@...utronix.de>, efault@....de, John Stultz <jstultz@...gle.com>,
Vincent Palomares <paillon@...gle.com>, Tobias Huschle <huschle@...ux.ibm.com>
Subject: Re: Very high scheduling delay with plenty of idle CPUs
On Tue, 12 Nov 2024 at 17:26, Saravana Kannan <saravanak@...gle.com> wrote:
>
> On Tue, Nov 12, 2024 at 1:03 AM Vincent Guittot
> <vincent.guittot@...aro.org> wrote:
> >
> > On Tue, 12 Nov 2024 at 08:24, Saravana Kannan <saravanak@...gle.com> wrote:
> > >
> > > On Mon, Nov 11, 2024 at 11:12 AM Vincent Guittot
> > > <vincent.guittot@...aro.org> wrote:
> > > >
> > > > On Mon, 11 Nov 2024 at 20:01, Vincent Guittot
> > > > <vincent.guittot@...aro.org> wrote:
> > > > >
> > > > > On Mon, 11 Nov 2024 at 19:24, Saravana Kannan <saravanak@...gle.com> wrote:
> > > > > >
> > > > > > On Mon, Nov 11, 2024 at 2:41 AM Peter Zijlstra <peterz@...radead.org> wrote:
> > > > > > >
> > > > > > > On Sun, Nov 10, 2024 at 10:15:07PM -0800, Saravana Kannan wrote:
> > > > > > >
> > > > > > > > I actually quickly hacked up the cpu_overutilized() function to return
> > > > > > > > true during suspend/resume and the threads are nicely spread out and
> > > > > > > > running in parallel. That actually reduces the total of the
> > > > > > > > dpm_resume*() phases from 90ms to 75ms on my Pixel 6.
> > > > > > >
> > > > > > > Right, so that kills EAS and makes it fall through to the regular
> > > > > > > select_idle_sibling() thing.
> > > > > > >
> > > > > > > > Peter,
> > > > > > > >
> > > > > > > > Would you be open to the scheduler being aware of
> > > > > > > > dpm_suspend*()/dpm_resume*() phases and triggering the CPU
> > > > > > > > overutilized behavior during these phases? I know it's a very use case
> > > > > > > > specific behavior but how often do we NOT want to speed up
> > > > > > > > suspend/resume? We can make this a CONFIG or a kernel command line
> > > > > > > > option -- say, fast_suspend or something like that.
> > > > > > >
> > > > > > > Well, I don't mind if Vincent doesn't. It seems like a very
> > > > > > > specific/targeted thing and should not affect much else, so it is a
> > > > > > > relatively safe thing to do.
> > > > > > >
> > > > > > > Perhaps a more direct hack in is_rd_overutilized() would be even less
> > > > > > > invasive, changing cpu_overutilized() relies on that getting propagated
> > > > > > > to rd->overutilized, might as well skip that step, no?
> > > > > >
> > > > > > is_rd_overutilized() sounds good to me. Outside of setting a flag in
> > > > >
> > > > > At know I'm not convinced that this is a solution but just a quick
> > > > > hack for your problem. We must understand 1st what is wrong
> > > >
> > > > And you should better switch to performance cpufreq governor to
> > > > disable eas and run at max freq if your further wants to decrease
> > > > latency
> > >
> > > Ohhh... now that you mention fixing CPU frequencies, a lot of systems
> > > fix their CPU frequencies during suspend/resume. Pixel 6 is one of
> > > them. In the case of Pixel 6, the driver sets the policy min/max to
> > > these fixed frequencies to force the CPU to stay at one frequency.
> > > Will EAS handle this correctly? I wonder if that'd affect the task
> >
> > AFAICT, it should
>
> To be clear, I'm not opposed to any sched fixes that will do the right
> thing naturally.
a quick try on rb5 while continuing testing my rework of eas patch
doesn't show the problem and I still need to check with current eas
version
>
> > > placement decision. Also, other systems might limit CPU frequencies in
> > > ways that EAS can't tell. If the CPU frequencies are frozen, I'm not
> > > sure EAS makes a lot of sense. Except maybe using CPU max capacity to
> > > make sure little CPUs are busy first before using the big CPUs?
> > >
> > > But even if EAS thinks the CPU freq could go up (when it can't), it
> > > still doesn't make a lot of sense to not use those idle CPUs and
> > > instead try to bump up the frequency (by putting more threads in a
> > > CPU).
> >
> > In this case, you just need to call the below before entering suspend
> > and after resuming
> > echo 1 > /proc/sys/kernel/sched_energy_aware
> > instead of hacking overutilized
> > This will disable EAS without rebuilding sched domain
>
> That disables EAS for a huge portion of the suspend/resume where we do
> want it to be enabled.
>
> Also, as I said before, I want to do this only for the "devices
> resume" part where there is a lot of parallelism. Not for the entire
> system suspend/resume.
Would this be really a problem ? You might not get the disable of eas
for your exact portion but on the other hand, you want to speedup
suspend resume.
I mean, if systems already fix frequency of cpus during suspend
resume, they can just disable eas as well. eas will be disable but
sched_asym_cpucapacity will remain enabled
>
> Is there an in-kernel version of this call? Do I just need to set and
> clear sysctl_sched_energy_aware? Also, does setting/clearing
no, it ends up updating a static key
> overutilized rebuild the sched domain?
no.
But system is not overutilized as you mentioned in your description,
you have some scheduling latency constraint on kworker threads
>
> Thanks,
> Saravana
>
> >
> > >
> > > Anyway, with all this in mind, it makes more sense to me to just
> > > trigger the "overutilized" mode during these specific parts of
> > > suspend/resume.
> > >
> > > -Saravana
> > >
> > > >
> > > > >
> > > > > > sched.c that the suspend/resume code sets/clears, I can't think of an
> > > > > > interface that's better at avoiding abuse. Let me know if you have
> > > > > > any. Otherwise, I'll just go with the flag option. If Vincent gets the
> > > > > > scheduler to do the right thing without this, I'll happily drop this
> > > > > > targeted hack.
> > > > > >
> > > > > > -Saravana
Powered by blists - more mailing lists