[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zymk_EaHkk7FPqru@google.com>
Date: Mon, 4 Nov 2024 20:54:20 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Yong He <zhuangel570@...il.com>
Cc: Maxim Levitsky <mlevitsk@...hat.com>
Subject: Re: [PATCH] KVM: x86: Update irr_pending when setting APIC state with
APICv disabled
+Maxim
On Fri, Nov 01, 2024, Sean Christopherson wrote:
> Explicitly set apic->irr_pending after stuffing the vIRR when userspace
> sets APIC state and APICv is disabled, otherwise KVM will skip scanning
> the vIRR in subsequent calls to apic_find_highest_irr(), and ultimately
> fail to inject the interrupt until another interrupt happens to be added
> to the vIRR.
>
> Only the APICv-disabled case is flawed, as KVM forces apic->irr_pending to
> be true if APICv is enabled, because not all vIRR updates will be visible
> to KVM.
>
> Note, irr_pending is intentionally not updated in kvm_apic_update_apicv(),
> because when APICv is being inhibited/disabled, KVM needs to keep the flag
> set until the next emulated EOI so that KVM will correctly handle any
> in-flight updates to the vIRR from hardware. But when setting APIC state,
> neither the VM nor the VMM can assume specific ordering between an update
> from hardware and overwriting all state in kvm_apic_set_state(), thus KVM
> can safely clear irr_pending if the vIRR is empty.
>
> Reported-by: Yong He <zhuangel570@...il.com>
> Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40tencent.com
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> arch/x86/kvm/lapic.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 65412640cfc7..deb73aea2c06 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -3086,6 +3086,15 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
> kvm_x86_call(hwapic_irr_update)(vcpu,
> apic_find_highest_irr(apic));
> kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic));
> + } else {
> + /*
> + * Note, kvm_apic_update_apicv() is responsible for updating
> + * isr_count and highest_isr_cache. irr_pending is somewhat
> + * special because it mustn't be cleared when APICv is disabled
> + * at runtime, and only state restore can cause an IRR bit to
> + * be set without also refreshing irr_pending.
> + */
> + apic->irr_pending = apic_search_irr(apic) != -1;
I did a bit more archaeology in order to give this a Fixes tag (and a Cc: stable),
and found two interesting evolutions of this code.
The bug was introduced by commit 755c2bf87860 ("KVM: x86: lapic: don't touch
irr_pending in kvm_apic_update_apicv when inhibiting it"), which as the shortlog
suggests, deleted code that update irr_pending.
Before that commit, kvm_apic_update_apicv() did more or less what I am proposing
here, with the obvious difference that the proposed fix is specific to
kvm_lapic_reset().
struct kvm_lapic *apic = vcpu->arch.apic;
if (vcpu->arch.apicv_active) {
/* irr_pending is always true when apicv is activated. */
apic->irr_pending = true;
apic->isr_count = 1;
} else {
apic->irr_pending = (apic_search_irr(apic) != -1);
apic->isr_count = count_vectors(apic->regs + APIC_ISR);
}
And _that_ bug (clearing irr_pending) was introduced by commit b26a695a1d78 ("kvm:
lapic: Introduce APICv update helper function"). Prior to 97a71c444a14, KVM
unconditionally set irr_pending to true in kvm_apic_set_state(), i.e. assumed
that the new virtual APIC state could have a pending IRQ (which isn't a terrible
assumption.
Furthermore, in addition to introducing this issue, commit 755c2bf87860 also
papered over the underlying bug: KVM doesn't ensure CPUs and devices see APICv
as disabled prior to searching the IRR. Waiting until KVM emulates EOI to update
irr_pending works because KVM won't emulate EOI until after refresh_apicv_exec_ctrl(),
and because there are plenty of memory barries in between, but leaving irr_pending
set is basically hacking around bad ordering, which I _think_ can be fixed by:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83fe0a78146f..85d330b56c7e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10548,8 +10548,8 @@ void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
goto out;
apic->apicv_active = activate;
- kvm_apic_update_apicv(vcpu);
kvm_x86_call(refresh_apicv_exec_ctrl)(vcpu);
+ kvm_apic_update_apicv(vcpu);
/*
* When APICv gets disabled, we may still have injected interrupts
So, while searching the IRR is technically sufficient to fix the bug, I'm leaning
*very* strongly torward fixing this bug by unconditionally setting irr_pending
to true in kvm_apic_update_apicv(), with a FIXME to call out what KVM should be
doing. And then address that FIXME in a future series (I have a rather massive
pile of fixes and cleanups that are closely related, so there will be ample
opportunity).
From: Sean Christopherson <seanjc@...gle.com>
Date: Fri, 1 Nov 2024 12:35:32 -0700
Subject: [PATCH] KVM: x86: Unconditionally set irr_pending when updating APICv
state
TODO: writeme
Fixes: 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it")
Cc: stable@...r.kernel.org
Reported-by: Yong He <zhuangel570@...il.com>
Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40tencent.com
Signed-off-by: Sean Christopherson <seanjc@...gle.com>
---
arch/x86/kvm/lapic.c | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2098dc689088..95c6beb8ce27 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2629,19 +2629,26 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
- if (apic->apicv_active) {
- /* irr_pending is always true when apicv is activated. */
- apic->irr_pending = true;
+ /*
+ * When APICv is enabled, KVM must always search the IRR for a pending
+ * IRQ, as other vCPUs and devices can set IRR bits even if the vCPU
+ * isn't running. If APICv is disabled, KVM _should_ search the IRR
+ * for a pending IRQ. But KVM currently doesn't ensure *all* hardware,
+ * e.g. CPUs and IOMMUs, has seen the change in state, i.e. searching
+ * the IRR at this time could race with IRQ delivery from hardware that
+ * still sees APICv as being enabled.
+ *
+ * FIXME: Ensure other vCPUs and devices observe the change in APICv
+ * state prior to updating KVM's metadata caches, so that KVM
+ * can safely search the IRR and set irr_pending accordingly.
+ */
+ apic->irr_pending = true;
+
+ if (apic->apicv_active)
apic->isr_count = 1;
- } else {
- /*
- * Don't clear irr_pending, searching the IRR can race with
- * updates from the CPU as APICv is still active from hardware's
- * perspective. The flag will be cleared as appropriate when
- * KVM injects the interrupt.
- */
+ else
apic->isr_count = count_vectors(apic->regs + APIC_ISR);
- }
+
apic->highest_isr_cache = -1;
}
base-commit: 8fe4fefefa1b9ea01557d454699c20fdf709e890
--
Powered by blists - more mailing lists