lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCvwPq+8pQcTZePiee9EXxKAQS=J57X2OavWFrQwkDt5A@mail.gmail.com>
Date: Wed, 25 Sep 2024 15:27:45 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Quentin Perret <qperret@...gle.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com, 
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com, 
	mgorman@...e.de, vschneid@...hat.com, lukasz.luba@....com, 
	rafael.j.wysocki@...el.com, linux-kernel@...r.kernel.org, qyousef@...alina.io, 
	hongyan.xia2@....com
Subject: Re: [RFC PATCH 4/5] sched/fair: Use EAS also when overutilized

On Fri, 20 Sept 2024 at 18:17, Quentin Perret <qperret@...gle.com> wrote:
>
> Hi Vincent,
>
> On Friday 30 Aug 2024 at 15:03:08 (+0200), Vincent Guittot wrote:
> > Keep looking for an energy efficient CPU even when the system is
> > overutilized and use the CPU returned by feec() if it has been able to find
> > one. Otherwise fallback to the default performance and spread mode of the
> > scheduler.
> > A system can become overutilized for a short time when workers of a
> > workqueue wake up for a short background work like vmstat update.
> > Continuing to look for a energy efficient CPU will prevent to break the
> > power packing of tasks.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> > ---
> >  kernel/sched/fair.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 2273eecf6086..e46af2416159 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8505,7 +8505,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> >                   cpumask_test_cpu(cpu, p->cpus_ptr))
> >                       return cpu;
> >
> > -             if (!is_rd_overutilized(this_rq()->rd)) {
> > +             if (sched_energy_enabled()) {
>
> As mentioned during LPC, when there is no idle time on a CPU, the
> utilization value of the tasks running on it is no longer a good
> approximation for how much the tasks want, it becomes an image of how
> much CPU time they were given. That is particularly problematic in the
> co-scheduling case, but not just.

Yes, this is not always true when overutilized and  true after a
certain amount of time. When a CPU is fully utilized without any idle
time anymore, feec() will not find a CPU for the task

>
> IOW, when we're OU, the util values are bogus, so using feec() is frankly
> wrong IMO. If we don't have a good idea of how long tasks want to run,

Except that the CPU is not already fully busy without idle time when
the system is overutilized. We have  ~20% margin on each CPU which
means that system are overutilized as soon as one CPU is more than 80%
utilized which is far from not having idle time anymore. So even when
OU, it doesn't mean that all CPUs don't have idle time and most of the
time the opposite happens and feec() can still make a useful decision.
Also, when there is no idle time on a CPU, the task doesn't fit and
feec() doesn't return a CPU.

Then, the old way to compute invariant utilization was particularly
sensible to the overutilized state because the utilization was capped
and asymptotically converging to max cpu compute capacity but this is
not true with the new pelt and we can go above compute capacity of the
cpu and remain correct as long as we are able to increase the compute
capacity before that there is no idle time. In theory, the utilization
"could" be correct until we reach 1024 (for utilization or runnable)
and there is no way to catch up the temporary under compute capacity.

> the EM just can't help us with anything so we should stay away from it.
>
> I understand how just plain bailing out as we do today is sub-optimal,
> but whatever we do to improve on that can't be doing utilization-based
> task placement.
>
> Have you considered making the default (non-EAS) wake-up path a little
> more reluctant to migrations when EAS is enabled? That should allow us
> to maintain a somewhat stable task placement when OU is only transient
> (e.g. due to misfit), but without using util values when we really
> shouldn't.
>
> Thoughts?

As mentioned above OU doesn't mean no idle time anymore and in this
case utilization is still relevant. In would be in favor of adding
more performance related decision into feec() similarly to have is
done in patch 3 which would be for example that if a cpu doesn't fit
we could still return  a CPU with more performance focus


>
> Thanks,
> Quentin
>
> >                       new_cpu = find_energy_efficient_cpu(p, prev_cpu);
> >                       if (new_cpu >= 0)
> >                               return new_cpu;
> > --
> > 2.34.1
> >
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ