linux-kernel - Re: [PATCH] svm: Fix AVIC incomplete IPI emulation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <752f5b0a-381c-d559-ec8a-dc41b1e36010@scalemp.com>
Date:   Wed, 13 Mar 2019 09:30:06 +0200
From:   Oren Twaig <oren@...lemp.com>
To:     "Suthikulpanit, Suravee" <Suravee.Suthikulpanit@....com>
Cc:     "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] svm: Fix AVIC incomplete IPI emulation

Hi Suravee,

Please see below..

On 03/11/2019 01:38 PM, Suthikulpanit, Suravee wrote:
> Hi Oren,
>
> Sorry for delay response.
>
> On 3/5/19 1:15 AM, Oren Twaig wrote:
>> Hello Suravee,
>>
>> According to AMD's SDM, the target-not-running incomplete
>> ipi exit is only received if any of the destination cpus had the
>> not-running bit set in the avic backing page.
> I believe you are referring to the "isRunning" (IR) bit is in the
> AVIC physical APIC ID table entry.
I meant cause ID=1 in IPI Delivery Failure Cause (SDM rev 3.30, sep 
2018, Table 15-28):

"
1: IPI Target Not Running:

IsRunning bit of the target for a
Singlecast/Broadcast/Multicast IPI is not set in
the physical APIC ID table.

"

>
>> However, not before the CPU _already_ set the relevant IRR bit
>> in all these cpus.
> Not sure what you meant here.

Here is the full snippet from the specifications:
"
5.For every valid destination:
- Atomically set the appropriate IRR bit in each of the destinations’
   vAPIC backing page.
- Check the IsRunning status of each destination.
- If the destination IsRunning bit is set, send a doorbell message
   using the host physical core number from the Physical APIC ID table.

6. If any destinations are identified as not currently scheduled on
  a physical core (i.e., the IsRunning
  bit for that virtual processor is not set), cause a #VMEXIT.
"

According to the specification above, the HW should first
set the appropriate bit in the IRR (Interrupt Request Register)
_before_ causing VMEXIT of IPI-delivery-not-completed
with  ID=1 (Target not running).

>
>> In this change, the patch forces KVM to send another interrupt
>> to the vcpu whether SVM already did that or not. Which means
>> the vcpu/s, under some conditions, can get an EXTRA interrupt
>> it never intended to get >> Example:
>>     1. vcpu B: Is in "not-running" state.
>>     2. vcpu A: Writes to the ICR to send vector 80 to vcpu B
>>     3. vcpu A: SVM updates vcpu B IRR with bit 80
>>     4. vcpu A: SVM exits on incomplete IPI target-not-running exit.
>>     5. vcpu A: Now stops executing any code @ hypervisor level.
>>     6. vcpu B: Due to another interrupt (like lapic timer)
>>        resumes running the guest. While handling interrupts,
>>        it also handles interrupt vector 80 (as it's in his IRR)
>>     7. vcpu A: resumes executing the below code and sends
>>        an _additional_interrupt to vcpu B.
>>
>> Overall, vcpu B got two interrupts. The second is unwanted and
>> not documented in the system architecture.
>>
>> Can you please elaborate more to why the implementation
>> below conflict with the specifications (which was the code
>> before this commit) ?
> This patch was introduced to fix an issue where the SVM driver tries to
> handle the step 5 above by scheduling vcpu B into _running_ state to handle
> the IPI from vcpu A. However, prior to this patch, vcpu B was never get
> scheduled to run unless there are other interrupts (e.g. timer).
Exactly. Only what needed here is *only* to wakeup the vcpu B. Why ? because
the apic of vcpu B _already_ contains the interrupt in the pending
IRR. Than, once vcpu B will run it will process the IRR which contains
the vector placed by the HW and will deliver it.
> This should not be the case as Vcpu B should have been running regardless
> of other interrupts. So, I don't think step 6 and 7 above are correct.
The example of vcpu A that stops executing is just to highlight that
the code can't depend on that the kvm code of vcpu A will finish the
ICR "fake" call before vcpu B runs (beacuse of any interrupt) and process
that IRR request placed by the HW.
>
> The issue was caused by the apic->irr_pending not set to true when trying to
> get vcpu B scheduled. This flag is checked in apic_find_highest_irr() before
> searching for the highest bit.
>
> To fix the issue, I decided to leverage the existing emulation code for
> ICR and ICR2, which in turn calls apic_send_ipi() to deliver interrupt to vpu B.
>
> However, looking a bit more closely, I notice the logic in svm_deliver_avic_intr()
> should also have been changed from kvm_vcpu_wake_up() to kvm_vcpu_kick()
> since the latter will result in clearing the IRR bit for the IPI vector
> when trying to send IPI as part of the following call path.
>
>     vcpu_enter_guest()
>       |-- inject_pending_event()
>         |-- kvm_cpu_get_interrupt()
>           |--  kvm_get_apic_interrupt()
>             |-- apic_clear_irr()
>             |-- apic_set_isr()
>             |-- apic_update_ppr() ....
>
> Please see the patch below.
>
> Not sure if this would address the problem you are seeing.
I still think there a bug here where vcpu B will get two interrupts
instead of one.

Thanks,
Oren
>
> Thanks,
> Suravee
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 24dfa6a93711..d2841c3dbc04 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -5219,11 +5256,13 @@ static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>           kvm_lapic_set_irr(vec, vcpu->arch.apic);
>           smp_mb__after_atomic();
>
> -       if (avic_vcpu_is_running(vcpu))
> +       if (avic_vcpu_is_running(vcpu)) {
>                   wrmsrl(SVM_AVIC_DOORBELL,
>                          kvm_cpu_get_apicid(vcpu->cpu));
> -       else
> -               kvm_vcpu_wake_up(vcpu);
> +       } else {
> +               kvm_make_request(KVM_REQ_EVENT, vcpu);
> +               kvm_vcpu_kick(vcpu);
> +       }
>    }
>
>    static void svm_ir_list_del(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)