[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zv2Ay9Y3TswTwW_B@google.com>
Date: Wed, 2 Oct 2024 10:20:11 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: "Markku Ahvenjärvi" <mankku@...il.com>
Cc: bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
janne.karhunen@...il.com, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
mingo@...hat.com, pbonzini@...hat.com, tglx@...utronix.de, x86@...nel.org
Subject: Re: [PATCH 1/1] KVM: nVMX: update VPPR on vmlaunch/vmresume
On Wed, Oct 02, 2024, Sean Christopherson wrote:
> On Wed, Oct 02, 2024, Sean Christopherson wrote:
> > On Wed, Oct 02, 2024, Markku Ahvenjärvi wrote:
> > > Hi Sean,
> > >
> > > > On Fri, Sep 20, 2024, Markku Ahvenjärvi wrote:
> > > > > Running certain hypervisors under KVM on VMX suffered L1 hangs after
> > > > > launching a nested guest. The external interrupts were not processed on
> > > > > vmlaunch/vmresume due to stale VPPR, and L2 guest would resume without
> > > > > allowing L1 hypervisor to process the events.
> > > > >
> > > > > The patch ensures VPPR to be updated when checking for pending
> > > > > interrupts.
> > > >
> > > > This is architecturally incorrect, PPR isn't refreshed at VM-Enter.
> > >
> > > I looked into this and found the following from Intel manual:
> > >
> > > "30.1.3 PPR Virtualization
> > >
> > > The processor performs PPR virtualization in response to the following
> > > operations: (1) VM entry; (2) TPR virtualization; and (3) EOI virtualization.
> > >
> > > ..."
> > >
> > > The section "27.3.2.5 Updating Non-Register State" further explains the VM
> > > enter:
> > >
> > > "If the “virtual-interrupt delivery” VM-execution control is 1, VM entry loads
> > > the values of RVI and SVI from the guest interrupt-status field in the VMCS
> > > (see Section 25.4.2). After doing so, the logical processor first causes PPR
> > > virtualization (Section 30.1.3) and then evaluates pending virtual interrupts
> > > (Section 30.2.1). If a virtual interrupt is recognized, it may be delivered in
> > > VMX non-root operation immediately after VM entry (including any specified
> > > event injection) completes; ..."
> > >
> > > According to that, PPR is supposed to be refreshed at VM-Enter, or am I
> > > missing something here?
> >
> > Huh, I missed that. It makes sense I guess; VM-Enter processes pending virtual
> > interrupts, so it stands that VM-Enter would refresh PPR as well.
> >
> > Ugh, and looking again, KVM refreshes PPR every time it checks for a pending
> > interrupt, including the VM-Enter case (via kvm_apic_has_interrupt()) when nested
> > posted interrupts are in use:
> >
> > /* Emulate processing of posted interrupts on VM-Enter. */
> > if (nested_cpu_has_posted_intr(vmcs12) &&
> > kvm_apic_has_interrupt(vcpu) == vmx->nested.posted_intr_nv) {
> > vmx->nested.pi_pending = true;
> > kvm_make_request(KVM_REQ_EVENT, vcpu);
> > kvm_apic_clear_irr(vcpu, vmx->nested.posted_intr_nv);
> > }
> >
> > I'm still curious as to what's different about your setup, but certainly not
> > curious enough to hold up a fix.
>
> Actually, none of the above is even relevant. PPR virtualization in the nested
> VM-Enter case would be for _L2's_ vPRR, not L1's. And that virtualization is
> performed by hardware (vmcs02 has the correct RVI, SVI, and vAPIC information
> for L2).
>
> Which means my initial instinct that KVM is missing a PPR update somewhere is
> likely correct.
Talking to myself :-)
Assuming it actually fixes your issue, this is what I'm planning on posting. I
suspect KVM botches something when the deprivileged host is active, but given
that the below will allow for additional cleanups, and practically speaking doesn't
have any downsides, I don't see any reason to withhold the hack-a-fix. Though
hopefully we'll someday figure out exactly what's broken.
---
From: Sean Christopherson <seanjc@...gle.com>
Date: Wed, 2 Oct 2024 08:53:23 -0700
Subject: [PATCH] KVM: nVMX: Explicitly update vPPR on successful nested
VM-Enter
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Request pending event evaluation after successful nested VM-Enter if L1
has a pending IRQ, which in turn refreshes vPPR based on vTPR and vISR.
This fixes an issue where KVM will fail to deliver a pending IRQ to L1
when running an atypical hypervisor in L1, e.g. the pKVM port to VMX.
Odds are very good that refreshing L1's vPPR is papering over a missed
PPR update somewhere, e.g. the underlying bug likely related to the fact
that pKVM passes through its APIC to the depriveleged host (which is an
L2 guest from KVM's perspective).
However, KVM updates PPR _constantly_, even when PPR technically shouldn't
be refreshed, e.g. kvm_vcpu_has_events() re-evaluates PPR if IRQs are
unblocked, by way of the same kvm_apic_has_interrupt() check. Ditto for
nested VM-Enter itself, when nested posted interrupts are enabled. Thus,
trying to avoid a PPR update on VM-Enter just to be pedantically accurate
is ridiculous, given the behavior elsewhere in KVM.
Unconditionally checking for interrupts will also allow for additional
cleanups, e.g. the manual RVI check earlier in VM-Enter emulation by
by vmx_has_apicv_interrupt() can be dropped, and the aforementioned nested
posted interrupt logic can be massaged to better honor priority between
concurrent events.
Link: https://lore.kernel.org/kvm/20230312180048.1778187-1-jason.cj.chen@intel.com
Reported-by: Markku Ahvenjärvi <mankku@...il.com>
Closes: https://lore.kernel.org/all/20240920080012.74405-1-mankku@gmail.com
Suggested-by: Markku Ahvenjärvi <mankku@...il.com>
Cc: Janne Karhunen <janne.karhunen@...il.com>
Signed-off-by: Sean Christopherson <seanjc@...gle.com>
---
arch/x86/kvm/vmx/nested.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a8e7bc04d9bf..784b61c9810b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3593,7 +3593,8 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
* effectively unblock various events, e.g. INIT/SIPI cause VM-Exit
* unconditionally.
*/
- if (unlikely(evaluate_pending_interrupts))
+ if (unlikely(evaluate_pending_interrupts) ||
+ kvm_apic_has_interrupt(vcpu))
kvm_make_request(KVM_REQ_EVENT, vcpu);
/*
base-commit: e32cde8d2bd7d251a8f9b434143977ddf13dcec6
--
Powered by blists - more mailing lists