lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZvUlB8s-zIkDQji7@google.com>
Date: Thu, 26 Sep 2024 09:10:31 +0000
From: Quentin Perret <qperret@...gle.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, vschneid@...hat.com, lukasz.luba@....com,
	rafael.j.wysocki@...el.com, linux-kernel@...r.kernel.org,
	qyousef@...alina.io, hongyan.xia2@....com
Subject: Re: [RFC PATCH 4/5] sched/fair: Use EAS also when overutilized

Hi Vincent,

On Wednesday 25 Sep 2024 at 15:27:45 (+0200), Vincent Guittot wrote:
> On Fri, 20 Sept 2024 at 18:17, Quentin Perret <qperret@...gle.com> wrote:
> >
> > Hi Vincent,
> >
> > On Friday 30 Aug 2024 at 15:03:08 (+0200), Vincent Guittot wrote:
> > > Keep looking for an energy efficient CPU even when the system is
> > > overutilized and use the CPU returned by feec() if it has been able to find
> > > one. Otherwise fallback to the default performance and spread mode of the
> > > scheduler.
> > > A system can become overutilized for a short time when workers of a
> > > workqueue wake up for a short background work like vmstat update.
> > > Continuing to look for a energy efficient CPU will prevent to break the
> > > power packing of tasks.
> > >
> > > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> > > ---
> > >  kernel/sched/fair.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 2273eecf6086..e46af2416159 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8505,7 +8505,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> > >                   cpumask_test_cpu(cpu, p->cpus_ptr))
> > >                       return cpu;
> > >
> > > -             if (!is_rd_overutilized(this_rq()->rd)) {
> > > +             if (sched_energy_enabled()) {
> >
> > As mentioned during LPC, when there is no idle time on a CPU, the
> > utilization value of the tasks running on it is no longer a good
> > approximation for how much the tasks want, it becomes an image of how
> > much CPU time they were given. That is particularly problematic in the
> > co-scheduling case, but not just.
> 
> Yes, this is not always true when overutilized and  true after a
> certain amount of time. When a CPU is fully utilized without any idle
> time anymore, feec() will not find a CPU for the task

Well the problem is that is might actually find a CPU for the task -- a
co-scheduled task can obviously look arbitrarily small from a util PoV.

> >
> > IOW, when we're OU, the util values are bogus, so using feec() is frankly
> > wrong IMO. If we don't have a good idea of how long tasks want to run,
> 
> Except that the CPU is not already fully busy without idle time when
> the system is overutilized. We have  ~20% margin on each CPU which
> means that system are overutilized as soon as one CPU is more than 80%
> utilized which is far from not having idle time anymore. So even when
> OU, it doesn't mean that all CPUs don't have idle time and most of the
> time the opposite happens and feec() can still make a useful decision.

My problem with the proposed change here is that it doesn't at all
distinguish between the truly overloaded case (when we have more compute
demand that resources) from a system with a stable-ish utilization at
90%. If you're worried about the latter, then perhaps we should think
about redefining the OU threshold some other way (either by simply
making higher or configurable, or changing its nature to look at the
last time we actually got idle time in the system). But I'm still rather
opinionated that util-based placement is wrong for the former.

And for what it's worth, in my experience if any of the big CPUs get
anywhere near the top of their OPP range, given that the power/perf
curve is exponential it's being penny-wise and pound-foolish to
micro-optimise the placement of the other smaller tasks from an energy
PoV at the same time. But if we can show that it helps real use-cases,
then why not.

> Also, when there is no idle time on a CPU, the task doesn't fit and
> feec() doesn't return a CPU.

It doesn't fit on that CPU but might still (incorrectly) fit on another
CPU right?

> Then, the old way to compute invariant utilization was particularly
> sensible to the overutilized state because the utilization was capped
> and asymptotically converging to max cpu compute capacity but this is
> not true with the new pelt and we can go above compute capacity of the
> cpu and remain correct as long as we are able to increase the compute
> capacity before that there is no idle time. In theory, the utilization
> "could" be correct until we reach 1024 (for utilization or runnable)
> and there is no way to catch up the temporary under compute capacity.
> 
> > the EM just can't help us with anything so we should stay away from it.
> >
> > I understand how just plain bailing out as we do today is sub-optimal,
> > but whatever we do to improve on that can't be doing utilization-based
> > task placement.
> >
> > Have you considered making the default (non-EAS) wake-up path a little
> > more reluctant to migrations when EAS is enabled? That should allow us
> > to maintain a somewhat stable task placement when OU is only transient
> > (e.g. due to misfit), but without using util values when we really
> > shouldn't.
> >
> > Thoughts?
> 
> As mentioned above OU doesn't mean no idle time anymore and in this
> case utilization is still relevant

OK, but please distinguish this from the truly overloaded case somehow,
I really don't think we can 'break' it just to help with the corner case
when we've got 90% ish util.

> In would be in favor of adding
> more performance related decision into feec() similarly to have is
> done in patch 3 which would be for example that if a cpu doesn't fit
> we could still return  a CPU with more performance focus

Fine with me in principle as long as we stop using utilization as a
proxy for how much a task wants when it really isn't that any more.

Thanks!
Quentin

> >
> > Thanks,
> > Quentin
> >
> > >                       new_cpu = find_energy_efficient_cpu(p, prev_cpu);
> > >                       if (new_cpu >= 0)
> > >                               return new_cpu;
> > > --
> > > 2.34.1
> > >
> > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ