[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aRGPcYE4liEI+DfT@intel.com>
Date: Mon, 10 Nov 2025 15:08:33 +0800
From: Chao Gao <chao.gao@...el.com>
To: Dongli Zhang <dongli.zhang@...cle.com>
CC: <kvm@...r.kernel.org>, <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<seanjc@...gle.com>, <pbonzini@...hat.com>, <tglx@...utronix.de>,
<mingo@...hat.com>, <bp@...en8.de>, <dave.hansen@...ux.intel.com>,
<hpa@...or.com>, <joe.jin@...cle.com>, <alejandro.j.jimenez@...cle.com>
Subject: Re: [PATCH v2 1/1] KVM: VMX: configure SVI during runtime APICv
activation
On Sun, Nov 09, 2025 at 10:32:12PM -0800, Dongli Zhang wrote:
>The APICv (apic->apicv_active) can be activated or deactivated at runtime,
>for instance, because of APICv inhibit reasons. Intel VMX employs different
>mechanisms to virtualize LAPIC based on whether APICv is active.
>
>When APICv is activated at runtime, GUEST_INTR_STATUS is used to configure
>and report the current pending IRR and ISR states. Unless a specific vector
>is explicitly included in EOI_EXIT_BITMAP, its EOI will not be trapped to
>KVM. Intel VMX automatically clears the corresponding ISR bit based on the
>GUEST_INTR_STATUS.SVI field.
>
>When APICv is deactivated at runtime, the VM_ENTRY_INTR_INFO_FIELD is used
>to specify the next interrupt vector to invoke upon VM-entry. The
>VMX IDT_VECTORING_INFO_FIELD is used to report un-invoked vectors on
>VM-exit. EOIs are always trapped to KVM, so the software can manually clear
>pending ISR bits.
>
>There are scenarios where, with APICv activated at runtime, a guest-issued
>EOI may not be able to clear the pending ISR bit.
>
>Taking vector 236 as an example, here is one scenario.
>
>1. Suppose APICv is inactive. Vector 236 is pending in the IRR.
>2. To handle KVM_REQ_EVENT, KVM moves vector 236 from the IRR to the ISR,
>and configures the VM_ENTRY_INTR_INFO_FIELD via vmx_inject_irq().
>3. After VM-entry, vector 236 is invoked through the guest IDT. At this
>point, the data in VM_ENTRY_INTR_INFO_FIELD is no longer valid. The guest
>interrupt handler for vector 236 is invoked.
>4. Suppose a VM exit occurs very early in the guest interrupt handler,
>before the EOI is issued.
>5. Nothing is reported through the IDT_VECTORING_INFO_FIELD because
>vector 236 has already been invoked in the guest.
>6. Now, suppose APICv is activated. Before the next VM-entry, KVM calls
>kvm_vcpu_update_apicv() to activate APICv.
>7. Unfortunately, GUEST_INTR_STATUS.SVI is not configured, although
>vector 236 is still pending in the ISR.
>8. After VM-entry, the guest finally issues the EOI for vector 236.
>However, because SVI is not configured, vector 236 is not cleared.
>9. ISR is stalled forever on vector 236.
>
>Here is another scenario.
>
>1. Suppose APICv is inactive. Vector 236 is pending in the IRR.
>2. To handle KVM_REQ_EVENT, KVM moves vector 236 from the IRR to the ISR,
>and configures the VM_ENTRY_INTR_INFO_FIELD via vmx_inject_irq().
>3. VM-exit occurs immediately after the next VM-entry. The vector 236 is
>not invoked through the guest IDT. Instead, it is saved to the
>IDT_VECTORING_INFO_FIELD during the VM-exit.
>4. KVM calls kvm_queue_interrupt() to re-queue the un-invoked vector 236
>into vcpu->arch.interrupt. A KVM_REQ_EVENT is requested.
>5. Now, suppose APICv is activated. Before the next VM-entry, KVM calls
>kvm_vcpu_update_apicv() to activate APICv.
>6. Although APICv is now active, KVM still uses the legacy
>VM_ENTRY_INTR_INFO_FIELD to re-inject vector 236. GUEST_INTR_STATUS.SVI is
>not configured.
>7. After the next VM-entry, vector 236 is invoked through the guest IDT.
>Finally, an EOI occurs. However, due to the lack of GUEST_INTR_STATUS.SVI
>configuration, vector 236 is not cleared from the ISR.
>8. ISR is stalled forever on vector 236.
>
>Using QEMU as an example, vector 236 is stuck in ISR forever.
>
>(qemu) info lapic 1
>dumping local APIC state for CPU 1
>
>LVT0 0x00010700 active-hi edge masked ExtINT (vec 0)
>LVT1 0x00010400 active-hi edge masked NMI
>LVTPC 0x00000400 active-hi edge NMI
>LVTERR 0x000000fe active-hi edge Fixed (vec 254)
>LVTTHMR 0x00010000 active-hi edge masked Fixed (vec 0)
>LVTT 0x000400ec active-hi edge tsc-deadline Fixed (vec 236)
>Timer DCR=0x0 (divide by 2) initial_count = 0 current_count = 0
>SPIV 0x000001ff APIC enabled, focus=off, spurious vec 255
>ICR 0x000000fd physical edge de-assert no-shorthand
>ICR2 0x00000000 cpu 0 (X2APIC ID)
>ESR 0x00000000
>ISR 236
>IRR 37(level) 236
>
>The issue is not applicable to AMD SVM which employs a different LAPIC
>virtualization mechanism. In addition, APICV_INHIBIT_REASON_IRQWIN ensures
>AMD SVM AVIC is not activated until the last interrupt is EOI.
>
>Fix the bug by configuring Intel VMX GUEST_INTR_STATUS.SVI if APICv is
>activated at runtime.
>
>Signed-off-by: Dongli Zhang <dongli.zhang@...cle.com>
Reviewed-by: Chao Gao <chao.gao@...el.com>
Powered by blists - more mailing lists