linux-kernel - Re: [RFC PATCH v2 0/5] Paravirt Scheduling (Dynamic vcpu priority management)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZpcFxd_oyInfggXJ@google.com>
Date: Tue, 16 Jul 2024 16:44:05 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Joel Fernandes <joel@...lfernandes.org>, 
	Vineeth Remanan Pillai <vineeth@...byteword.org>, Ben Segall <bsegall@...gle.com>, 
	Borislav Petkov <bp@...en8.de>, Daniel Bristot de Oliveira <bristot@...hat.com>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	"H . Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>, 
	Mel Gorman <mgorman@...e.de>, Paolo Bonzini <pbonzini@...hat.com>, Andy Lutomirski <luto@...nel.org>, 
	Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>, 
	Valentin Schneider <vschneid@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Vitaly Kuznetsov <vkuznets@...hat.com>, Wanpeng Li <wanpengli@...cent.com>, 
	Suleiman Souhlal <suleiman@...gle.com>, Masami Hiramatsu <mhiramat@...nel.org>, himadrics@...ia.fr, 
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org, x86@...nel.org, 
	graf@...zon.com, drjunior.org@...il.com
Subject: Re: [RFC PATCH v2 0/5] Paravirt Scheduling (Dynamic vcpu priority management)

On Fri, Jul 12, 2024, Steven Rostedt wrote:
> On Fri, 12 Jul 2024 09:44:16 -0700
> Sean Christopherson <seanjc@...gle.com> wrote:
> 
> > > All we need is a notifier that gets called at every VMEXIT.  
> > 
> > Why?  The only argument I've seen for needing to hook VM-Exit is so that the
> > host can speculatively boost the priority of the vCPU when deliverying an IRQ,
> > but (a) I'm unconvinced that is necessary, i.e. that the vCPU needs to be boosted
> > _before_ the guest IRQ handler is invoked and (b) it has almost no benefit on
> > modern hardware that supports posted interrupts and IPI virtualization, i.e. for
> > which there will be no VM-Exit.
> 
> No. The speculatively boost was for something else, but slightly
> related. I guess the ideal there was to have the interrupt coming in
> boost the vCPU because the interrupt could be waking an RT task. It may
> still be something needed, but that's not what I'm talking about here.
> 
> The idea here is when an RT task is scheduled in on the guest, we want
> to lazily boost it. As long as the vCPU is running on the CPU, we do
> not need to do anything. If the RT task is scheduled for a very short
> time, it should not need to call any hypercall. It would set the shared
> memory to the new priority when the RT task is scheduled, and then put
> back the lower priority when it is scheduled out and a SCHED_OTHER task
> is scheduled in.
> 
> Now if the vCPU gets preempted, it is this moment that we need the host
> kernel to look at the current priority of the task thread running on
> the vCPU. If it is an RT task, we need to boost the vCPU to that
> priority, so that a lower priority host thread does not interrupt it.

I got all that, but I still don't see any need to hook VM-Exit.  If the vCPU gets
preempted, the host scheduler is already getting "notified", otherwise the vCPU
would still be scheduled in, i.e. wouldn't have been preempted.

> The host should also set a bit in the shared memory to tell the guest
> that it was boosted. Then when the vCPU schedules a lower priority task
> than what is in shared memory, and the bit is set that tells the guest
> the host boosted the vCPU, it needs to make a hypercall to tell the
> host that it can lower its priority again.

Which again doesn't _need_ a dedicated/manual VM-Exit.  E.g. why force the host
to reasses the priority instead of simply waiting until the next reschedule?  If
the host is running tickless, then presumably there is a scheduling entity running
on a different pCPU, i.e. that can react to vCPU priority changes without needing
a VM-Exit.