[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANRm+CxMf=kwDRQE-BNbhgCARuV3fuKpDbEV2oWTeKuGhUYd+w@mail.gmail.com>
Date:   Thu, 22 Apr 2021 20:21:47 +0800
From:   Wanpeng Li <kernellwp@...il.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     Kenta Ishiguro <kentaishiguro@...ab.ics.keio.ac.jp>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        David Hildenbrand <david@...hat.com>,
        kvm <kvm@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
        Pierre-Louis Aublin <pl@...ab.ics.keio.ac.jp>,
        河野健二 <kono@...ab.ics.keio.ac.jp>
Subject: Re: [RFC PATCH 0/2] Mitigating Excessive Pause-Loop Exiting in
 VM-Agnostic KVM
On Thu, 22 Apr 2021 at 09:45, Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Thu, Apr 22, 2021, Kenta Ishiguro wrote:
> > To solve problems (2) and (3), patch 2 monitors IPI communication between
> > vCPUs and leverages the relationship between vCPUs to select boost
> > candidates.  The "[PATCH] KVM: Boost vCPU candidiate in user mode which is
> > delivering interrupt" patch
> > (https://lore.kernel.org/kvm/CANRm+Cy-78UnrkX8nh5WdHut2WW5NU=UL84FRJnUNjsAPK+Uww@mail.gmail.com/T/)
> > seems to be effective for (2) while it only uses the IPI receiver
> > information.
>
> On the IPI side of thing, I like the idea of explicitly tracking the IPIs,
> especially if we can simplify the implementation, e.g. by losing the receiver
> info and making ipi_received a bool.  Maybe temporarily table Wanpeng's patch
> while this approach is analyzed?
Hi all,
I evaluate my patch
(https://lore.kernel.org/kvm/1618542490-14756-1-git-send-email-wanpengli@tencent.com),
Kenta's patch 2 and Sean's suggestion. The testing environment is
pbzip2 in 96 vCPUs VM in over-subscribe scenario (The host machine is
2 socket, 48 cores, 96 HTs Intel CLX box). Note: the Kenta's scheduler
hacking is not applied. The score of my patch is the most stable and
the best performance.
Wanpeng's patch
The average: vanilla -> boost: 69.124 -> 61.975, 10.3%
* Wall Clock: 61.695359 seconds
* Wall Clock: 63.343579 seconds
* Wall Clock: 61.567513 seconds
* Wall Clock: 62.144722 seconds
* Wall Clock: 61.091442 seconds
* Wall Clock: 62.085912 seconds
* Wall Clock: 61.311954 seconds
Kenta' patch
The average: vanilla -> boost: 69.148 -> 64.567, 6.6%
* Wall Clock:  66.288113 seconds
* Wall Clock:  61.228642 seconds
* Wall Clock:  62.100524 seconds
* Wall Clock:  68.355473 seconds
* Wall Clock:  64.864608 seconds
Sean's suggestion:
The average: vanilla -> boost: 69.148 -> 66.505, 3.8%
* Wall Clock: 60.583562 seconds
* Wall Clock: 58.533960 seconds
* Wall Clock: 70.103489 seconds
* Wall Clock: 74.279028 seconds
* Wall Clock: 69.024194 seconds
I follow(almost) Sean's suggestion:
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0050f39..78b5eb6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1272,6 +1272,7 @@ EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
 void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
 {
     struct kvm_lapic_irq irq;
+    struct kvm_vcpu *dest_vcpu;
     irq.vector = icr_low & APIC_VECTOR_MASK;
     irq.delivery_mode = icr_low & APIC_MODE_MASK;
@@ -1285,6 +1286,10 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic,
u32 icr_low, u32 icr_high)
     else
         irq.dest_id = GET_APIC_DEST_FIELD(icr_high);
+    dest_vcpu = kvm_get_vcpu_by_id(apic->vcpu->kvm, irq.dest_id);
+    if (dest_vcpu)
+        WRITE_ONCE(dest_vcpu->ipi_received, true);
+
     trace_kvm_apic_ipi(icr_low, irq.dest_id);
     kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, &irq, NULL);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 303fb55..a98bf571 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9298,6 +9298,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
     if (test_thread_flag(TIF_NEED_FPU_LOAD))
         switch_fpu_return();
+    WRITE_ONCE(vcpu->ipi_received, false);
+
     if (unlikely(vcpu->arch.switch_db_regs)) {
         set_debugreg(0, 7);
         set_debugreg(vcpu->arch.eff_db[0], 0);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5ef09a4..81e39fa 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -332,6 +332,8 @@ struct kvm_vcpu {
         bool dy_eligible;
     } spin_loop;
 #endif
+
+    bool ipi_received;
     bool preempted;
     bool ready;
     struct kvm_vcpu_arch arch;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c682f82..5098929 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -411,6 +411,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu,
struct kvm *kvm, unsigned id)
     kvm_vcpu_set_in_spin_loop(vcpu, false);
     kvm_vcpu_set_dy_eligible(vcpu, false);
+    vcpu->ipi_received = false;
     vcpu->preempted = false;
     vcpu->ready = false;
     preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
@@ -3220,6 +3221,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool
yield_to_kernel_mode)
                 !vcpu_dy_runnable(vcpu))
                 continue;
             if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
+                !READ_ONCE(vcpu->ipi_received) &&
                 !kvm_arch_vcpu_in_kernel(vcpu))
                 continue;
             if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
Powered by blists - more mailing lists
 
