[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <752f5b0a-381c-d559-ec8a-dc41b1e36010@scalemp.com>
Date: Wed, 13 Mar 2019 09:30:06 +0200
From: Oren Twaig <oren@...lemp.com>
To: "Suthikulpanit, Suravee" <Suravee.Suthikulpanit@....com>
Cc: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] svm: Fix AVIC incomplete IPI emulation
Hi Suravee,
Please see below..
On 03/11/2019 01:38 PM, Suthikulpanit, Suravee wrote:
> Hi Oren,
>
> Sorry for delay response.
>
> On 3/5/19 1:15 AM, Oren Twaig wrote:
>> Hello Suravee,
>>
>> According to AMD's SDM, the target-not-running incomplete
>> ipi exit is only received if any of the destination cpus had the
>> not-running bit set in the avic backing page.
> I believe you are referring to the "isRunning" (IR) bit is in the
> AVIC physical APIC ID table entry.
I meant cause ID=1 in IPI Delivery Failure Cause (SDM rev 3.30, sep
2018, Table 15-28):
"
1: IPI Target Not Running:
IsRunning bit of the target for a
Singlecast/Broadcast/Multicast IPI is not set in
the physical APIC ID table.
"
>
>> However, not before the CPU _already_ set the relevant IRR bit
>> in all these cpus.
> Not sure what you meant here.
Here is the full snippet from the specifications:
"
5.For every valid destination:
- Atomically set the appropriate IRR bit in each of the destinations’
vAPIC backing page.
- Check the IsRunning status of each destination.
- If the destination IsRunning bit is set, send a doorbell message
using the host physical core number from the Physical APIC ID table.
6. If any destinations are identified as not currently scheduled on
a physical core (i.e., the IsRunning
bit for that virtual processor is not set), cause a #VMEXIT.
"
According to the specification above, the HW should first
set the appropriate bit in the IRR (Interrupt Request Register)
_before_ causing VMEXIT of IPI-delivery-not-completed
with ID=1 (Target not running).
>
>> In this change, the patch forces KVM to send another interrupt
>> to the vcpu whether SVM already did that or not. Which means
>> the vcpu/s, under some conditions, can get an EXTRA interrupt
>> it never intended to get >> Example:
>> 1. vcpu B: Is in "not-running" state.
>> 2. vcpu A: Writes to the ICR to send vector 80 to vcpu B
>> 3. vcpu A: SVM updates vcpu B IRR with bit 80
>> 4. vcpu A: SVM exits on incomplete IPI target-not-running exit.
>> 5. vcpu A: Now stops executing any code @ hypervisor level.
>> 6. vcpu B: Due to another interrupt (like lapic timer)
>> resumes running the guest. While handling interrupts,
>> it also handles interrupt vector 80 (as it's in his IRR)
>> 7. vcpu A: resumes executing the below code and sends
>> an _additional_interrupt to vcpu B.
>>
>> Overall, vcpu B got two interrupts. The second is unwanted and
>> not documented in the system architecture.
>>
>> Can you please elaborate more to why the implementation
>> below conflict with the specifications (which was the code
>> before this commit) ?
> This patch was introduced to fix an issue where the SVM driver tries to
> handle the step 5 above by scheduling vcpu B into _running_ state to handle
> the IPI from vcpu A. However, prior to this patch, vcpu B was never get
> scheduled to run unless there are other interrupts (e.g. timer).
Exactly. Only what needed here is *only* to wakeup the vcpu B. Why ? because
the apic of vcpu B _already_ contains the interrupt in the pending
IRR. Than, once vcpu B will run it will process the IRR which contains
the vector placed by the HW and will deliver it.
> This should not be the case as Vcpu B should have been running regardless
> of other interrupts. So, I don't think step 6 and 7 above are correct.
The example of vcpu A that stops executing is just to highlight that
the code can't depend on that the kvm code of vcpu A will finish the
ICR "fake" call before vcpu B runs (beacuse of any interrupt) and process
that IRR request placed by the HW.
>
> The issue was caused by the apic->irr_pending not set to true when trying to
> get vcpu B scheduled. This flag is checked in apic_find_highest_irr() before
> searching for the highest bit.
>
> To fix the issue, I decided to leverage the existing emulation code for
> ICR and ICR2, which in turn calls apic_send_ipi() to deliver interrupt to vpu B.
>
> However, looking a bit more closely, I notice the logic in svm_deliver_avic_intr()
> should also have been changed from kvm_vcpu_wake_up() to kvm_vcpu_kick()
> since the latter will result in clearing the IRR bit for the IPI vector
> when trying to send IPI as part of the following call path.
>
> vcpu_enter_guest()
> |-- inject_pending_event()
> |-- kvm_cpu_get_interrupt()
> |-- kvm_get_apic_interrupt()
> |-- apic_clear_irr()
> |-- apic_set_isr()
> |-- apic_update_ppr() ....
>
> Please see the patch below.
>
> Not sure if this would address the problem you are seeing.
I still think there a bug here where vcpu B will get two interrupts
instead of one.
Thanks,
Oren
>
> Thanks,
> Suravee
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 24dfa6a93711..d2841c3dbc04 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -5219,11 +5256,13 @@ static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> kvm_lapic_set_irr(vec, vcpu->arch.apic);
> smp_mb__after_atomic();
>
> - if (avic_vcpu_is_running(vcpu))
> + if (avic_vcpu_is_running(vcpu)) {
> wrmsrl(SVM_AVIC_DOORBELL,
> kvm_cpu_get_apicid(vcpu->cpu));
> - else
> - kvm_vcpu_wake_up(vcpu);
> + } else {
> + kvm_make_request(KVM_REQ_EVENT, vcpu);
> + kvm_vcpu_kick(vcpu);
> + }
> }
>
> static void svm_ir_list_del(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
Powered by blists - more mailing lists