lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <591b12f8c31264d1b7c7417ed916541196eddd58.camel@amazon.com>
Date: Thu, 27 Feb 2025 08:27:00 +0000
From: "Sieber, Fernand" <sieberf@...zon.com>
To: "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>
CC: "peterz@...radead.org" <peterz@...radead.org>, "mingo@...hat.com"
	<mingo@...hat.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>,
	"nh-open-source@...zon.com" <nh-open-source@...zon.com>
Subject: Re: [RFC PATCH 3/3] sched, x86: Make the scheduler guest unhalted aware

On Thu, 2025-02-27 at 08:34 +0100, Vincent Guittot wrote:
> On Tue, 18 Feb 2025 at 21:27, Fernand Sieber <sieberf@...zon.com>
> wrote:
> > 
> > With guest hlt/mwait/pause pass through, the scheduler has no
> > visibility into
> > real vCPU activity as it sees them all 100% active. As such, load
> > balancing
> > cannot make informed decisions on where it is preferrable to
> > collocate
> > tasks when necessary. I.e as far as the load balancer is concerned,
> > a
> > halted vCPU and an idle polling vCPU look exactly the same so it
> > may decide
> > that either should be preempted when in reality it would be
> > preferrable to
> > preempt the idle one.
> > 
> > This commits enlightens the scheduler to real guest activity in
> > this
> > situation. Leveraging gtime unhalted, it adds a hook for kvm to
> > communicate
> > to the scheduler the duration that a vCPU spends halted. This is
> > then used in
> > PELT accounting to discount it from real activity. This results in
> > better
> > placement and overall steal time reduction.
> 
> NAK, PELT account for time spent by se on the CPU. 

I was essentially aiming to adjust this concept to "PELT account for
the time spent by se *unhalted* on the CPU". Would such an adjustments
of the definition cause problems?

> If your thread/vcpu doesn't do anything but burn cycles, find another
> way to report thatto the host

The main advantage of hooking into PELT is that it means that load
balancing will just work out of the box as it immediately adjusts the
sched_group util/load/runnable values.

It may be possible to scope down my change to load balancing without
touching PELT if that is not viable. For example instead of using PELT
we could potentially adjust the calculation of sgs->avg_load in
update_sg_lb_stats for overloaded groups to include a correcting factor
based on recent halted cycles of the CPU. The comparison of two
overloaded groups would then favor pulling tasks on the one that has
the most halted cycles. This approach is more scoped down as it doesn't
change the classification of scheduling groups, instead it just changes
how overloaded groups are compared. I would need to prototype to see if
it works.

Let me know if this would go in the right direction or if you have any
other ideas of alternate options?

> Furthermore this breaks all the hierarchy dependency

I am not understanding the meaning of this comment, could you please
provide more details?

> 
> > 
> > This initial implementation assumes that non-idle CPUs are ticking
> > as it
> > hooks the unhalted time the PELT decaying load accounting. As such
> > it
> > doesn't work well if PELT is updated infrequenly with large chunks
> > of
> > halted time. This is not a fundamental limitation but more complex
> > accounting is needed to generalize the use case to nohz full.



Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ