[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBoVCnoO+vScNXRqXXWwRBT0MGOqeeAZ4VeAB+pPZVrCw@mail.gmail.com>
Date: Thu, 27 Feb 2025 10:03:58 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: "Sieber, Fernand" <sieberf@...zon.com>
Cc: "peterz@...radead.org" <peterz@...radead.org>, "mingo@...hat.com" <mingo@...hat.com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>,
"nh-open-source@...zon.com" <nh-open-source@...zon.com>
Subject: Re: [RFC PATCH 3/3] sched, x86: Make the scheduler guest unhalted aware
On Thu, 27 Feb 2025 at 09:27, Sieber, Fernand <sieberf@...zon.com> wrote:
>
> On Thu, 2025-02-27 at 08:34 +0100, Vincent Guittot wrote:
> > On Tue, 18 Feb 2025 at 21:27, Fernand Sieber <sieberf@...zon.com>
> > wrote:
> > >
> > > With guest hlt/mwait/pause pass through, the scheduler has no
> > > visibility into
> > > real vCPU activity as it sees them all 100% active. As such, load
> > > balancing
> > > cannot make informed decisions on where it is preferrable to
> > > collocate
> > > tasks when necessary. I.e as far as the load balancer is concerned,
> > > a
> > > halted vCPU and an idle polling vCPU look exactly the same so it
> > > may decide
> > > that either should be preempted when in reality it would be
> > > preferrable to
> > > preempt the idle one.
> > >
> > > This commits enlightens the scheduler to real guest activity in
> > > this
> > > situation. Leveraging gtime unhalted, it adds a hook for kvm to
> > > communicate
> > > to the scheduler the duration that a vCPU spends halted. This is
> > > then used in
> > > PELT accounting to discount it from real activity. This results in
> > > better
> > > placement and overall steal time reduction.
> >
> > NAK, PELT account for time spent by se on the CPU.
>
> I was essentially aiming to adjust this concept to "PELT account for
> the time spent by se *unhalted* on the CPU". Would such an adjustments
> of the definition cause problems?
Yes, It's not in the scope of PELT to know that a se is a vcpu and if
this vcpu is halted or not
>
> > If your thread/vcpu doesn't do anything but burn cycles, find another
> > way to report thatto the host
>
> The main advantage of hooking into PELT is that it means that load
> balancing will just work out of the box as it immediately adjusts the
> sched_group util/load/runnable values.
>
> It may be possible to scope down my change to load balancing without
> touching PELT if that is not viable. For example instead of using PELT
> we could potentially adjust the calculation of sgs->avg_load in
> update_sg_lb_stats for overloaded groups to include a correcting factor
> based on recent halted cycles of the CPU. The comparison of two
> overloaded groups would then favor pulling tasks on the one that has
> the most halted cycles. This approach is more scoped down as it doesn't
> change the classification of scheduling groups, instead it just changes
> how overloaded groups are compared. I would need to prototype to see if
> it works.
This is not better than PELT
>
> Let me know if this would go in the right direction or if you have any
> other ideas of alternate options?
The below should give you some insights
https://lore.kernel.org/kvm/CAO7JXPhMfibNsX6Nx902PRo7_A2b4Rnc3UP=bpKYeOuQnHvtrw@mail.gmail.com/
I don't think that you need any change in the scheduler. Use the
current public scheduler interfaces to adjust the priority of your
vcpu. As an example switching your thread to SCHED_IDLE is a good way
to say that your thread has a very low priority and the scheduler is
able to handle such information
>
> > Furthermore this breaks all the hierarchy dependency
>
> I am not understanding the meaning of this comment, could you please
> provide more details?
>
> >
> > >
> > > This initial implementation assumes that non-idle CPUs are ticking
> > > as it
> > > hooks the unhalted time the PELT decaying load accounting. As such
> > > it
> > > doesn't work well if PELT is updated infrequenly with large chunks
> > > of
> > > halted time. This is not a fundamental limitation but more complex
> > > accounting is needed to generalize the use case to nohz full.
>
>
>
> Amazon Development Centre (South Africa) (Proprietary) Limited
> 29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
> Registration Number: 2004 / 034463 / 07
Powered by blists - more mailing lists