linux-kernel - [RFC PATCH 3/3] KVM: SVM: Optimize IRQ window inhibit handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55adf9e49743b8027231d66d79369b774a353536.1752819570.git.naveen@kernel.org>
Date: Fri, 18 Jul 2025 12:13:36 +0530
From: "Naveen N Rao (AMD)" <naveen@...nel.org>
To: Sean Christopherson <seanjc@...gle.com>,
	Paolo Bonzini <pbonzini@...hat.com>
Cc: <kvm@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>,
	Maxim Levitsky <mlevitsk@...hat.com>,
	Vasant Hegde <vasant.hegde@....com>,
	Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Subject: [RFC PATCH 3/3] KVM: SVM: Optimize IRQ window inhibit handling

IRQ windows represent times during which an IRQ can be injected into a
vCPU, and thus represent times when a vCPU is running with RFLAGS.IF=1
and GIF enabled (TPR/PPR don't matter since KVM controls interrupt
injection and it only injects one interrupt at a time). On SVM, when
emulating the local APIC (i.e., AVIC disabled), KVM detects IRQ windows
by injecting a dummy virtual interrupt through VMCB.V_IRQ and
intercepting virtual interrupts (INTERCEPT_VINTR). This intercept
triggers as soon as the guest enables interrupts and is about to take
the dummy interrupt, at which point the actual interrupt can be injected
through VMCB.EVENTINJ.

When AVIC is enabled, VMCB.V_IRQ is ignored by the hardware and so
detecting IRQ windows requires AVIC to be inhibited. However, this is
only necessary for ExtINTs since all other interrupts can be injected
either by directly setting IRR in the APIC backing page and letting the
AVIC hardware inject the interrupt into the guest, or via VMCB.V_NMI for
NMIs.

If AVIC is enabled but inhibited for some other reason, KVM has to
request for IRQ window inhibits every time it has to inject an interrupt
into the guest. This is because APICv inhibits are dynamic in nature, so
KVM has to be sure that AVIC is inhibited for purposes of discovering an
IRQ window even if the other inhibit is cleared in the meantime.

This is particularly problematic with APICV_INHIBIT_REASON_PIT_REINJ
which stays set throughout the life of the guest and results in KVM
rapidly toggling IRQ window inhibit resulting in contention on
apicv_update_lock.

Address this by setting and clearing APICV_INHIBIT_REASON_PIT_REINJ
lazily: if some other inhibit reason is already set, just increment the
IRQ window request count and do not update apicv_inhibit_reasons
immediately. If any other inhibit reason is set/cleared in the meantime,
re-evaluate APICV_INHIBIT_REASON_PIT_REINJ by checking the IRQ window
request count and update apicv_inhibit_reasons appropriately. Otherwise,
just the IRQ window request count is incremented/decremented each time
an IRQ window is requested. This reduces much of the contention on the
apicv_update_lock semaphore and does away with much of the performance
degradation.

---
 arch/x86/kvm/x86.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

I think patch tags for this should be:
	From: Sean Christopherson <seanjc@...gle.com>

	Signed-off-by: Sean Christopherson <seanjc@...gle.com>
	Co-developed-by: Paolo Bonzini <pbonzini@...hat.com>
	Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
	Co-developed-by: Naveen N Rao (AMD) <naveen@...nel.org>
	Signed-off-by: Naveen N Rao (AMD) <naveen@...nel.org>

- Naveen

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 216d1801a4f2..845afcf6e85f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10534,7 +10534,11 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,

 	old = new = kvm->arch.apicv_inhibit_reasons;

-	set_or_clear_apicv_inhibit(&new, reason, set);
+	if (reason != APICV_INHIBIT_REASON_IRQWIN)
+		set_or_clear_apicv_inhibit(&new, reason, set);
+
+	set_or_clear_apicv_inhibit(&new, APICV_INHIBIT_REASON_IRQWIN,
+				   atomic_read(&kvm->arch.apicv_nr_irq_window_req));

 	if (!!old != !!new) {
 		/*
@@ -10582,6 +10586,26 @@ void kvm_inc_or_dec_irq_window_inhibit(struct kvm *kvm, bool inc)
 	if (!enable_apicv)
 		return;

+	/*
+	 * IRQ windows are requested either because of ExtINT injections, or
+	 * because APICv is already disabled/inhibited for another reason.
+	 * While ExtINT injections are rare and should not happen while the
+	 * vCPU is running its actual workload, it's worth avoiding thrashing
+	 * if the IRQ window is being requested because APICv is already
+	 * inhibited.  So, toggle the actual inhibit (which requires taking
+	 * the lock for write) if and only if there's no other inhibit.
+	 * kvm_set_or_clear_apicv_inhibit() always evaluates the IRQ window
+	 * count; thus the IRQ window inhibit call _will_ be lazily updated on
+	 * the next call, if it ever happens.
+	 */
+	if (READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~BIT(APICV_INHIBIT_REASON_IRQWIN)) {
+		guard(rwsem_read)(&kvm->arch.apicv_update_lock);
+		if (READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~BIT(APICV_INHIBIT_REASON_IRQWIN)) {
+			atomic_add(add, &kvm->arch.apicv_nr_irq_window_req);
+			return;
+		}
+	}
+
 	/*
 	 * Strictly speaking, the lock is only needed if going 0->1 or 1->0,
 	 * a la atomic_dec_and_mutex_lock.  However, ExtINTs are rare and
-- 
2.50.1