[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b1ddb439-9e28-4a58-ba86-0395bfc081e0@redhat.com>
Date: Wed, 13 Nov 2024 18:59:53 +0100
From: Paolo Bonzini <pbonzini@...hat.com>
To: Doug Covelli <doug.covelli@...adcom.com>
Cc: Zack Rusin <zack.rusin@...adcom.com>,
Sean Christopherson <seanjc@...gle.com>, kvm <kvm@...r.kernel.org>,
Jonathan Corbet <corbet@....net>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
the arch/x86 maintainers <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
Shuah Khan <shuah@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Isaku Yamahata <isaku.yamahata@...el.com>, Joel Stanley <joel@....id.au>,
Linux Doc Mailing List <linux-doc@...r.kernel.org>,
"Kernel Mailing List, Linux" <linux-kernel@...r.kernel.org>,
linux-kselftest <linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH 2/3] KVM: x86: Add support for VMware guest specific
hypercalls
On 11/13/24 17:24, Doug Covelli wrote:
>> No worries, you're not hijacking :) The only reason is that it would
>> be more code for a seldom used feature and anyway with worse performance.
>> (To be clear, CR8 based accesses are allowed, but stores cause an exit
>> in order to check the new TPR against IRR. That's because KVM's API
>> does not have an equivalent of the TPR threshold as you point out below).
>
> I have not really looked at the code but it seems like it could also
> simplify things as CR8 would be handled more uniformly regardless of
> who is virtualizing the local APIC.
Not much because CR8 basically does not exist at all (it's just a byte
in memory) with userspace APIC. So it's not easy to make it simpler, even
though it's less uniform.
That said, there is an optimization: you only get KVM_EXIT_SET_TPR if
CR8 decreases.
>>> Also I could not find these documented anywhere but with MSFT's APIC our monitor
>>> relies on extensions for trapping certain events such as INIT/SIPI plus LINT0
>>> and SVR writes:
>>>
>>> UINT64 X64ApicInitSipiExitTrap : 1; // WHvRunVpExitReasonX64ApicInitSipiTrap
>>> UINT64 X64ApicWriteLint0ExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap
>>> UINT64 X64ApicWriteLint1ExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap
>>> UINT64 X64ApicWriteSvrExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap
>>
>> There's no need for this in KVM's in-kernel APIC model. INIT and
>> SIPI are handled in the hypervisor and you can get the current
>> state of APs via KVM_GET_MPSTATE. LINT0 and LINT1 are injected
>> with KVM_INTERRUPT and KVM_NMI respectively, and they obey IF/PPR
>> and NMI blocking respectively, plus the interrupt shadow; so
>> there's no need for userspace to know when LINT0/LINT1 themselves
>> change. The spurious interrupt vector register is also handled
>> completely in kernel.
>
> I realize that KVM can handle LINT0/SVR updates themselves but our
> interrupt subsystem relies on knowing the current values of these
> registers even when not virtualizing the local APIC. I suppose we
> could use KVM_GET_LAPIC to sync things up on demand but that seems
> like it might nor be great from a performance point of view.
Ah no, you're right---you want to track the CPU that has ExtINT enabled
and send KVM_INTERRUPT to that one, I guess? And you need the spurious
vector registers because writes can set the mask bit in LINTx, but
essentially you want to trap LINT0 changes.
Something like this (missing the KVM_ENABLE_CAP and KVM_CHECK_EXTENSION
code) is good, feel free to include it in your v2 (Co-developed-by
and Signed-off-by me):
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5fb29ca3263b..b7dd89c99613 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -122,6 +122,7 @@
#define KVM_REQ_HV_TLB_FLUSH \
KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)
+#define KVM_REQ_REPORT_LINT0_ACCESS KVM_ARCH_REQ(35)
#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -775,6 +776,7 @@ struct kvm_vcpu_arch {
u64 smi_count;
bool at_instruction_boundary;
bool tpr_access_reporting;
+ bool lint0_access_reporting;
bool xfd_no_write_intercept;
u64 ia32_xss;
u64 microcode_version;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 88dc43660d23..0e070f447aa2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1561,6 +1561,21 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
apic->divide_count));
}
+static void __report_lint0_access(struct kvm_lapic *apic, u32 value)
+{
+ struct kvm_vcpu *vcpu = apic->vcpu;
+ struct kvm_run *run = vcpu->run;
+
+ kvm_make_request(KVM_REQ_REPORT_LINT0_ACCESS, vcpu);
+ run->lint0_access.value = value;
+}
+
+static inline void report_lint0_access(struct kvm_lapic *apic, u32 value)
+{
+ if (apic->vcpu->arch.lint0_access_reporting)
+ __report_lint0_access(apic, value);
+}
+
static void __report_tpr_access(struct kvm_lapic *apic, bool write)
{
struct kvm_vcpu *vcpu = apic->vcpu;
@@ -2312,8 +2327,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
int i;
for (i = 0; i < apic->nr_lvt_entries; i++) {
- kvm_lapic_set_reg(apic, APIC_LVTx(i),
- kvm_lapic_get_reg(apic, APIC_LVTx(i)) | APIC_LVT_MASKED);
+ u32 old = kvm_lapic_get_reg(apic, APIC_LVTx(i));
+ kvm_lapic_set_reg(apic, APIC_LVTx(i), old | APIC_LVT_MASKED);
+ if (i == 0 && !(old & APIC_LVT_MASKED))
+ report_lint0_access(apic, old | APIC_LVT_MASKED);
}
apic_update_lvtt(apic);
atomic_set(&apic->lapic_timer.pending, 0);
@@ -2352,6 +2369,8 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
if (!kvm_apic_sw_enabled(apic))
val |= APIC_LVT_MASKED;
val &= apic_lvt_mask[index];
+ if (index == 0 && val != kvm_lapic_get_reg(apic, reg))
+ report_lint0_access(apic, val);
kvm_lapic_set_reg(apic, reg, val);
break;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d0d3dc3b7ef6..2b039b372c3f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10879,6 +10879,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_vcpu_flush_tlb_guest(vcpu);
#endif
+ if (kvm_check_request(KVM_REQ_REPORT_LINT0_ACCESS, vcpu)) {
+ vcpu->run->exit_reason = KVM_EXIT_LINT0_ACCESS;
+ r = 0;
+ goto out;
+ }
if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
r = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 637efc055145..ec97727f9de4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -178,6 +178,7 @@ struct kvm_xen_exit {
#define KVM_EXIT_NOTIFY 37
#define KVM_EXIT_LOONGARCH_IOCSR 38
#define KVM_EXIT_MEMORY_FAULT 39
+#define KVM_EXIT_LINT0_ACCESS 40
/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -283,6 +284,10 @@ struct kvm_run {
__u64 flags;
};
} hypercall;
+ /* KVM_EXIT_LINT0_ACCESS */
+ struct {
+ __u32 value;
+ } lint0_access;
/* KVM_EXIT_TPR_ACCESS */
struct {
__u64 rip;
For LINT1, it should be less performance critical; if it's possible
to just go through all vCPUs, and do KVM_GET_LAPIC to check who you
should send a KVM_NMI to, then I'd do that. I'd also accept a patch
that adds a VM-wide KVM_NMI ioctl that does the same in the hypervisor
if it's useful for you.
And since I've been proven wrong already, what do you need INIT/SIPI for?
Paolo
Powered by blists - more mailing lists