linux-kernel - Re: Very high scheduling delay with plenty of idle CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGETcx_7LYuZi356mD2j7bcZReobQE0MjoT8vdtgvdN_L2t9ww@mail.gmail.com>
Date: Mon, 11 Nov 2024 23:23:47 -0800
From: Saravana Kannan <saravanak@...gle.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Peter Zijlstra <peterz@...radead.org>, K Prateek Nayak <kprateek.nayak@....com>, 
	Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Benjamin Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, LKML <linux-kernel@...r.kernel.org>, 
	wuyun.abel@...edance.com, youssefesmat@...omium.org, 
	Thomas Gleixner <tglx@...utronix.de>, efault@....de, John Stultz <jstultz@...gle.com>, 
	Vincent Palomares <paillon@...gle.com>, Tobias Huschle <huschle@...ux.ibm.com>
Subject: Re: Very high scheduling delay with plenty of idle CPUs

On Mon, Nov 11, 2024 at 11:12 AM Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> On Mon, 11 Nov 2024 at 20:01, Vincent Guittot
> <vincent.guittot@...aro.org> wrote:
> >
> > On Mon, 11 Nov 2024 at 19:24, Saravana Kannan <saravanak@...gle.com> wrote:
> > >
> > > On Mon, Nov 11, 2024 at 2:41 AM Peter Zijlstra <peterz@...radead.org> wrote:
> > > >
> > > > On Sun, Nov 10, 2024 at 10:15:07PM -0800, Saravana Kannan wrote:
> > > >
> > > > > I actually quickly hacked up the cpu_overutilized() function to return
> > > > > true during suspend/resume and the threads are nicely spread out and
> > > > > running in parallel. That actually reduces the total of the
> > > > > dpm_resume*() phases from 90ms to 75ms on my Pixel 6.
> > > >
> > > > Right, so that kills EAS and makes it fall through to the regular
> > > > select_idle_sibling() thing.
> > > >
> > > > > Peter,
> > > > >
> > > > > Would you be open to the scheduler being aware of
> > > > > dpm_suspend*()/dpm_resume*() phases and triggering the CPU
> > > > > overutilized behavior during these phases? I know it's a very use case
> > > > > specific behavior but how often do we NOT want to speed up
> > > > > suspend/resume? We can make this a CONFIG or a kernel command line
> > > > > option -- say, fast_suspend or something like that.
> > > >
> > > > Well, I don't mind if Vincent doesn't. It seems like a very
> > > > specific/targeted thing and should not affect much else, so it is a
> > > > relatively safe thing to do.
> > > >
> > > > Perhaps a more direct hack in is_rd_overutilized() would be even less
> > > > invasive, changing cpu_overutilized() relies on that getting propagated
> > > > to rd->overutilized, might as well skip that step, no?
> > >
> > > is_rd_overutilized() sounds good to me. Outside of setting a flag in
> >
> > At know I'm not convinced that this is a solution but just a quick
> > hack for your problem. We must understand 1st what is wrong
>
> And you should better switch to performance cpufreq governor to
> disable eas and run at max freq if your further wants to decrease
> latency

Ohhh... now that you mention fixing CPU frequencies, a lot of systems
fix their CPU frequencies during suspend/resume. Pixel 6 is one of
them. In the case of Pixel 6, the driver sets the policy min/max to
these fixed frequencies to force the CPU to stay at one frequency.
Will EAS handle this correctly? I wonder if that'd affect the task
placement decision. Also, other systems might limit CPU frequencies in
ways that EAS can't tell. If the CPU frequencies are frozen, I'm not
sure EAS makes a lot of sense. Except maybe using CPU max capacity to
make sure little CPUs are busy first before using the big CPUs?

But even if EAS thinks the CPU freq could go up (when it can't), it
still doesn't make a lot of sense to not use those idle CPUs and
instead try to bump up the frequency (by putting more threads in a
CPU).

Anyway, with all this in mind, it makes more sense to me to just
trigger the "overutilized" mode during these specific parts of
suspend/resume.

-Saravana

>
> >
> > > sched.c that the suspend/resume code sets/clears, I can't think of an
> > > interface that's better at avoiding abuse. Let me know if you have
> > > any. Otherwise, I'll just go with the flag option. If Vincent gets the
> > > scheduler to do the right thing without this, I'll happily drop this
> > > targeted hack.
> > >
> > > -Saravana