[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <BF4DA74F-8117-4388-842B-9FFB582C9E61@nutanix.com>
Date: Thu, 18 Sep 2025 16:05:11 +0000
From: Khushit Shah <khushit.shah@...anix.com>
To: David Woodhouse <dwmw2@...radead.org>,
Vitaly Kuznetsov
<vkuznets@...hat.com>
CC: Vitaly Kuznetsov <vkuznets@...hat.com>,
"seanjc@...gle.com"
<seanjc@...gle.com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>,
Shaju Abraham <shaju.abraham@...anix.com>
Subject: Re: [BUG] [KVM/VMX] Level triggered interrupts mishandled on Windows
w/ nested virt(Credential Guard) when using split irqchip
-------------------------------------------------
RCA: KVM ignores Directed EOI bit in split-irqchip
--------------------------------------------------
We traced the issue to KVM not respecting the Directed EOI bit in the
LAPIC Spurious Interrupt Vector Register (APIC_SPIV, bit 12) when using
split-irqchip.
Per the x2APIC specification, when APIC_SPIV.DirectedEOI is set the
LAPIC must not broadcast EOIs to the IOAPIC. Instead, the guest is
responsible for issuing an IOAPIC EOI by writing to its EOI register.
How we confirmed the RCA:
We added a manual delay after the level-triggered interrupt EOI. Right
after the EOI, Windows performs VMRESUME to L2, injecting the same
vector; L2 services the interrupt and then writes the vector value to
0xFEC00040 (the IOAPIC EOI register).
Relevant logs (abridged)
qemu-kvm 169720 [043] 3975.550049: kvm:kvm_entry: vcpu 0, rip 0xfffff8052377167e
qemu-kvm 169710 [039] 3975.550064: kvm:kvm_set_irq: gsi 21 level 1 source 0
qemu-kvm 169710 [039] 3975.550065: kvm:kvm_msi_set_irq: dst 0 vec 161 (Fixed|physical|level)
qemu-kvm 169710 [039] 3975.550065: kvm:kvm_apic_accept_irq: apicid 0 vec 161 (Fixed|level)
qemu-kvm 169710 [039] 3975.550066: kvm:kvm_apicv_accept_irq: apicid 0 vec 161 (Fixed|level)
qemu-kvm 169720 [043] 3975.550067: kvm:kvm_exit: reason EXTERNAL_INTERRUPT rip 0xfffff8052363aa34 info 0 0
qemu-kvm 169720 [043] 3975.550068: kvm:kvm_nested_vmexit: CAN'T FIND FIELD "rip"<CANT FIND FIELD exit_code>vcpu 0 reason EXTERNAL_INTERRUPT rip 0xfffff8052363aa34 info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x800000f2 error_code 0x00000000
qemu-kvm 169720 [043] 3975.550069: kvm:kvm_nested_vmexit_inject: reason EXTERNAL_INTERRUPT info1 0 info2 0 int_info 800000a1 int_info_err 0
qemu-kvm 169720 [043] 3975.550070: kvm:kvm_entry: vcpu 0, rip 0xfffff829b2a4142c
qemu-kvm 169720 [043] 3975.550072: kvm:kvm_exit: reason EOI_INDUCED rip 0xfffff829b2a85561 info a1 0
qemu-kvm 169720 [043] 3975.550072: kvm:kvm_eoi: apicid 0 vector 161
qemu-kvm 169720 [043] 3975.550073: kvm:kvm_entry: vcpu 0, rip 0xfffff829b2a85561
qemu-kvm 169720 [043] 3975.550075: kvm:kvm_exit: reason VMRESUME rip 0xfffff829b2a41308 info 0 0
qemu-kvm 169720 [043] 3975.550075: kvm:kvm_nested_vmenter: rip: 0xfffff829b2a41308 vmcs: 0x000000011b3bc000 nested_rip: 0xfffff8052363aa34 int_ctl: 0x00000000 event_inj: 0x800000a1 nested_ept=y nested_eptp: 0x00000001030a501e
=================================================== L2 Services the Interrupt ======================================
qemu-kvm 169720 [043] 3975.550123: kvm:kvm_nested_vmexit: CAN'T FIND FIELD "rip"<CANT FIND FIELD exit_code>vcpu 0 reason MSR_WRITE rip 0xfffff8052362d36c info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000
qemu-kvm 169720 [043] 3975.550124: kvm:kvm_nested_vmexit_inject: reason MSR_WRITE info1 0 info2 0 int_info 0 int_info_err 0
qemu-kvm 169720 [043] 3975.550125: kvm:kvm_entry: vcpu 0, rip 0xfffff829b2a4142c
qemu-kvm 169720 [043] 3975.550127: kvm:kvm_exit: reason EPT_VIOLATION rip 0xfffff829b2b04b82 info d82 0
qemu-kvm 169720 [043] 3975.550127: kvm:kvm_page_fault: vcpu 0 rip 0xfffff829b2b04b82 address 0x00000000fec00040 error_code 0xd82
qemu-kvm 169720 [043] 3975.550130: kvm:kvm_emulate_insn: 0:fffff829b2b04b82: 89 48 40
qemu-kvm 169720 [043] 3975.550131: kvm:vcpu_match_mmio: gva 0xfffff827a2606040 gpa 0xfec00040 Write GPA
qemu-kvm 169720 [043] 3975.550131: kvm:kvm_mmio: mmio write len 4 gpa 0xfec00040 val 0xa1
qemu-kvm 169720 [043] 3975.550131: kvm:kvm_fpu: unload
qemu-kvm 169720 [043] 3975.550132: kvm:kvm_userspace_exit: reason KVM_EXIT_MMIO (6)
When APIC_SPIV.DirectedEOI is set, Windows expects that the LAPIC will
not EOI the IOAPIC. KVM, however, EOIs the IOAPIC from userspace while
the interrupt has not yet been serviced, so the line remains asserted
and the IOAPIC reinserts the interrupt. This loop continues and Windows
makes no progress.
Why this is Intel + split-irqchip only:
This is not seen on AMD with split-irqchip because Windows does not set
APIC_SPIV.DirectedEOI in these cases. We do not see this with
kernel-irqchip because, Directed EOI capability is only advertised if
the irqchip is not in kernel. (ref:
/arch/x86/kvm/lapic.c:kvm_apic_set_version). This is because in-kernel's
IOAPIC implementation does not have EOI registers (IOAPIC version 0x11).
While Qemu's default IOAPIC Implementation has EOI registers (IOAPIC
version 0x20).
This is possibly also the actual RCA for
commit 958a01dab8e02fc49f4fd619fad8c82a1108afdb
Author: Vitaly Kuznetsov <vkuznets@...hat.com>
Date: Tue Apr 2 10:02:15 2019 +0200
ioapic: allow buggy guests mishandling level-triggered interrupts to make progress
Patch:
https://patchwork.kernel.org/project/kvm/patch/20250918162529.640943-1-jon@nutanix.com/
Additional finding (potential separate bug)
In /arch/x86/kvm/lapic.c:kvm_apic_set_version
Directed EOI support is exposed to the guest whenever x2APIC is present
and the IOAPIC is not in-kernel:
if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC) &&
!ioapic_in_kernel(vcpu->kvm))
v |= APIC_LVR_DIRECTED_EOI;
kvm_lapic_set_reg(apic, APIC_LVR, v);
This assumes the userspace IOAPIC always supports Directed EOI. QEMU’s
IOAPIC version can be 0x11, which does not support Directed EOI. We
should track this separately.
Thanks for all the help!
Regards,
Khushit
Powered by blists - more mailing lists