linux-kernel - Re: [RFC PATCH 0/3] kvm,sched: Add gtime halted

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z7-A76KjcYB8HAP8@google.com>
Date: Wed, 26 Feb 2025 13:00:31 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Fernand Sieber <sieberf@...zon.com>
Cc: "x86@...nel.org" <x86@...nel.org>, "peterz@...radead.org" <peterz@...radead.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "mingo@...hat.com" <mingo@...hat.com>, 
	"vincent.guittot@...aro.org" <vincent.guittot@...aro.org>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
	"nh-open-source@...zon.com" <nh-open-source@...zon.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>
Subject: Re: [RFC PATCH 0/3] kvm,sched: Add gtime halted

On Wed, Feb 26, 2025, Fernand Sieber wrote:
> On Tue, 2025-02-25 at 18:17 -0800, Sean Christopherson wrote:
> > > In this RFC we introduce the concept of guest halted time to address
> > > these concerns. Guest halted time (gtime_halted) accounts for cycles
> > > spent in guest mode while the cpu is halted. gtime_halted relies on
> > > measuring the mperf msr register (x86) around VM enter/exits to compute
> > > the number of unhalted cycles; halted cycles are then derived from the
> > > tsc difference minus the mperf difference.
> > 
> > IMO, there are better ways to solve this than having KVM sample MPERF on
> > every entry and exit.
> > 
> > The kernel already samples APERF/MPREF on every tick and provides that
> > information via /proc/cpuinfo, just use that.  If your userspace is unable
> > to use /proc/cpuinfo or similar, that needs to be explained.
> 
> If I understand correctly what you are suggesting is to have userspace
> regularly sampling these values to detect the most idle CPUs and then
> use CPU affinity to repin housekeeping tasks to these. While it's
> possible this essentially requires to implement another scheduling
> layer in userspace through constant re-pinning of tasks. This also
> requires to constantly identify the full set of tasks that can induce
> undesirable overhead so that they can be pinned accordingly. For these
> reasons we would rather want the logic to be implemented directly in
> the scheduler.
> 
> > And if you're running vCPUs on tickless CPUs, and you're doing HLT/MWAIT
> > passthrough, *and* you want to schedule other tasks on those CPUs, then IMO
> > you're abusing all of those things and it's not KVM's problem to solve,
> > especially now that sched_ext is a thing.
> 
> We are running vCPUs with ticks, the rest of your observations are
> correct.

If there's a host tick, why do you need KVM's help to make scheduling decisions?
It sounds like what you want is a scheduler that is primarily driven by MPERF
(and APERF?), and sched_tick() => arch_scale_freq_tick() already knows about MPERF.