[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZytHAtKgixnZ/AOD@intel.com>
Date: Wed, 6 Nov 2024 18:37:54 +0800
From: Chao Gao <chao.gao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Paolo Bonzini <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Maxim Levitsky <mlevitsk@...hat.com>, Yong He
<zhuangel570@...il.com>
Subject: Re: [PATCH v2] KVM: x86: Unconditionally set irr_pending when
updating APICv state
On Tue, Nov 05, 2024 at 05:51:35PM -0800, Sean Christopherson wrote:
>Always set irr_pending (to true) when updating APICv status to fix a bug
>where KVM fails to set irr_pending when userspace sets APIC state and
>APICv is disabled, which ultimate results in KVM failing to inject the
>pending interrupt(s) that userspace stuffed into the vIRR, until another
>interrupt happens to be emulated by KVM.
>
>Only the APICv-disabled case is flawed, as KVM forces apic->irr_pending to
>be true if APICv is enabled, because not all vIRR updates will be visible
>to KVM.
>
>Hit the bug with a big hammer, even though strictly speaking KVM can scan
>the vIRR and set/clear irr_pending as appropriate for this specific case.
>The bug was introduced by commit 755c2bf87860 ("KVM: x86: lapic: don't
>touch irr_pending in kvm_apic_update_apicv when inhibiting it"), which as
>the shortlog suggests, deleted code that updated irr_pending.
>
>Before that commit, kvm_apic_update_apicv() did indeed scan the vIRR, with
>with the crucial difference that kvm_apic_update_apicv() did the scan even
>when APICv was being *disabled*, e.g. due to an AVIC inhibition.
>
> struct kvm_lapic *apic = vcpu->arch.apic;
>
> if (vcpu->arch.apicv_active) {
> /* irr_pending is always true when apicv is activated. */
> apic->irr_pending = true;
> apic->isr_count = 1;
> } else {
> apic->irr_pending = (apic_search_irr(apic) != -1);
> apic->isr_count = count_vectors(apic->regs + APIC_ISR);
> }
>
>And _that_ bug (clearing irr_pending) was introduced by commit b26a695a1d78
>("kvm: lapic: Introduce APICv update helper function"), prior to which KVM
>unconditionally set irr_pending to true in kvm_apic_set_state(), i.e.
>assumed that the new virtual APIC state could have a pending IRQ.
>
>Furthermore, in addition to introducing this issue, commit 755c2bf87860
>also papered over the underlying bug: KVM doesn't ensure CPUs and devices
>see APICv as disabled prior to searching the IRR. Waiting until KVM
>emulates an EOI to update irr_pending "works", but only because KVM won't
>emulate EOI until after refresh_apicv_exec_ctrl(), and there are plenty of
>memory barriers in between. I.e. leaving irr_pending set is basically
>hacking around bad ordering.
>
>So, effectively revert to the pre-b26a695a1d78 behavior for state restore,
>even though it's sub-optimal if no IRQs are pending, in order to provide a
>minimal fix, but leave behind a FIXME to document the ugliness. With luck,
>the ordering issue will be fixed and the mess will be cleaned up in the
>not-too-distant future.
>
>Fixes: 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it")
>Cc: stable@...r.kernel.org
>Cc: Maxim Levitsky <mlevitsk@...hat.com>
>Reported-by: Yong He <zhuangel570@...il.com>
>Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40tencent.com
>Signed-off-by: Sean Christopherson <seanjc@...gle.com>
>---
>
>v2: Go with a big hammer fix, and plan on scanning the vIRR for all cases
> once the ordering bug has been resolved, i.e. once KVM guarantees the
> scan happens after CPUs and devices see the new APICv state.
>
>v1: https://lore.kernel.org/all/20241101193532.1817004-1-seanjc@google.com
>
> arch/x86/kvm/lapic.c | 29 ++++++++++++++++++-----------
> 1 file changed, 18 insertions(+), 11 deletions(-)
>
>diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>index 65412640cfc7..e470061b744a 100644
>--- a/arch/x86/kvm/lapic.c
>+++ b/arch/x86/kvm/lapic.c
>@@ -2629,19 +2629,26 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
> {
> struct kvm_lapic *apic = vcpu->arch.apic;
>
>- if (apic->apicv_active) {
>- /* irr_pending is always true when apicv is activated. */
>- apic->irr_pending = true;
>+ /*
>+ * When APICv is enabled, KVM must always search the IRR for a pending
>+ * IRQ, as other vCPUs and devices can set IRR bits even if the vCPU
>+ * isn't running. If APICv is disabled, KVM _should_ search the IRR
>+ * for a pending IRQ. But KVM currently doesn't ensure *all* hardware,
>+ * e.g. CPUs and IOMMUs, has seen the change in state, i.e. searching
>+ * the IRR at this time could race with IRQ delivery from hardware that
>+ * still sees APICv as being enabled.
>+ *
>+ * FIXME: Ensure other vCPUs and devices observe the change in APICv
>+ * state prior to updating KVM's metadata caches, so that KVM
>+ * can safely search the IRR and set irr_pending accordingly.
>+ */
>+ apic->irr_pending = true;
Should irr_pending be cleared after the first search of IRR that finds no
pending IRQ, i.e., in apic_find_highest_irr() when !apic->apicv_active?
Otherwise, irr_pending will be out of sync until the arrival of an interrupt.
Not sure if we want to avoid the unnecessary performance overhead of repeatedly
searching IRR.
Powered by blists - more mailing lists