[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e09f0be9-6a2f-a8ee-3a96-c8ffdf3add3f@redhat.com>
Date: Sat, 2 May 2020 19:19:32 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Suravee Suthikulpanit <suravee.suthikulpanit@....com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc: rkrcmar@...hat.com, joro@...tes.org, jon.grimm@....com,
borisvk@...net.org
Subject: Re: [PATCH v2] kvm: ioapic: Introduce arch-specific check for lazy
update EOI mechanism
On 02/05/20 11:24, Suravee Suthikulpanit wrote:
> This is due to re-entrancy of the lazy update EOI logic
> when enable APICv with VFIO pass-through device, which
> sets up kvm_irqfd() w/ KVM_IRQFD_FLAG_RESAMPLE.
>
> Fixes by adding re-entrancy check logic.
This does not explain why this is the right fix.
The questions to answer are: what is causing the re-entrancy? and why
is dropping the second EOI update safe?
The answer to the latter could well be "because we've already processed
it", but the answer to the former is more important.
The re-entrancy happens because the irq state is the OR of
the interrupt state and the resamplefd state. That is, we don't
want to show the state as 0 until we've had a chance to set the
resamplefd. But if the interrupt has _not_ gone low then we get an
infinite loop.
So the actual root cause is that this is a level-triggered interrupt,
otherwise irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID
high and then low and you wouldn't have the infinite loop. But in the
case of level-triggered interrupts the VMEXIT already happens because
TMR is set; only edge-triggered interrupts need the lazy invocation
of the ack notifier. So this should be the fix:
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 7668fed1ce65..ca2d73cd00a3 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -225,12 +225,12 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
}
/*
- * AMD SVM AVIC accelerate EOI write and do not trap,
- * in-kernel IOAPIC will not be able to receive the EOI.
+ * AMD SVM AVIC accelerate EOI write iff the interrupt is level
+ * triggered, in-kernel IOAPIC will not be able to receive the EOI.
* In this case, we do lazy update of the pending EOI when
* trying to set IOAPIC irq.
*/
- if (kvm_apicv_activated(ioapic->kvm))
+ if (edge && kvm_apicv_activated(ioapic->kvm))
ioapic_lazy_update_eoi(ioapic, irq);
/*
Did I miss anything in the above analysis with respect to AVIC?
Paolo
Powered by blists - more mailing lists