linux-kernel - Re: [PATCH 2/3] KVM: x86: Add support for VMware guest specific hypercalls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <ZzTNk4G0sOd8_hn-@google.com>
Date: Wed, 13 Nov 2024 16:04:19 +0000
From: Sean Christopherson <seanjc@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Doug Covelli <doug.covelli@...adcom.com>, Zack Rusin <zack.rusin@...adcom.com>, 
	kvm <kvm@...r.kernel.org>, Jonathan Corbet <corbet@....net>, 
	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	"H. Peter Anvin" <hpa@...or.com>, Shuah Khan <shuah@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Arnaldo Carvalho de Melo <acme@...hat.com>, Isaku Yamahata <isaku.yamahata@...el.com>, 
	Joel Stanley <joel@....id.au>, linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-kselftest <linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH 2/3] KVM: x86: Add support for VMware guest specific hypercalls

On Wed, Nov 13, 2024, Paolo Bonzini wrote:
> Il mar 12 nov 2024, 21:44 Doug Covelli <doug.covelli@...adcom.com> ha
> scritto:
> 
> > > Split irqchip should be the best tradeoff. Without it, moves from cr8
> > > stay in the kernel, but moves to cr8 always go to userspace with a
> > > KVM_EXIT_SET_TPR exit. You also won't be able to use Intel
> > > flexpriority (in-processor accelerated TPR) because KVM does not know
> > > which bits are set in IRR. So it will be *really* every move to cr8
> > > that goes to userspace.
> >
> > Sorry to hijack this thread but is there a technical reason not to allow
> > CR8 based accesses to the TPR (not MMIO accesses) when the in-kernel local
> > APIC is not in use?
> 
> No worries, you're not hijacking :) The only reason is that it would be
> more code for a seldom used feature and anyway with worse performance. (To
> be clear, CR8 based accesses are allowed, but stores cause an exit in order
> to check the new TPR against IRR. That's because KVM's API does not have an
> equivalent of the TPR threshold as you point out below).
> 
> > Also I could not find these documented anywhere but with MSFT's APIC our
> > monitor relies on extensions for trapping certain events such as INIT/SIPI
> > plus LINT0 and SVR writes:
> >
> > UINT64 X64ApicInitSipiExitTrap    : 1; //
> > WHvRunVpExitReasonX64ApicInitSipiTrap
> > UINT64 X64ApicWriteLint0ExitTrap  : 1; //
> > WHvRunVpExitReasonX64ApicWriteTrap
> > UINT64 X64ApicWriteLint1ExitTrap  : 1; //
> > WHvRunVpExitReasonX64ApicWriteTrap
> > UINT64 X64ApicWriteSvrExitTrap    : 1; //
> > WHvRunVpExitReasonX64ApicWriteTrap
> >
> 
> There's no need for this in KVM's in-kernel APIC model. INIT and SIPI are
> handled in the hypervisor and you can get the current state of APs via
> KVM_GET_MPSTATE. LINT0 and LINT1 are injected with KVM_INTERRUPT and
> KVM_NMI respectively, and they obey IF/PPR and NMI blocking respectively,
> plus the interrupt shadow; so there's no need for userspace to know when
> LINT0/LINT1 themselves change. The spurious interrupt vector register is
> also handled completely in kernel.
> 
> > I did not see any similar functionality for KVM.  Does anything like that
> > exist?  In any case we would be happy to add support for handling CR8
> > accesses w/o exiting w/o the in-kernel APIC along with some sort of a way
> > to configure the TPR threshold if folks are not opposed to that.
> 
> As far I know everybody who's using KVM (whether proprietary or open
> source) has had no need for that, so I don't think it's a good idea to make
> the API more complex. 

+1

> Performance of Windows guests is going to be bad anyway with userspace APIC.

Heh, on modern hardware, performance of any guest is going to suck with a userspace
APIC, compared to what is possible with an in-kernel APIC.

More importantly, I really, really don't want to encourage non-trivial usage of
a local APIC in userspace.  KVM's support for a userspace local APIC is very
poorly tested these days.   I have zero desire to spend any amount of time reviewing
and fixing issues that are unique to emulating the local APIC in userspace.  And
long term, I would love to force an in-kernel local APIC, though I don't know if
that's entirely feasible.