linux-kernel - Re: [PATCH v2 1/2] KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANRm+CyOxwzOQXcPCCdzc7DGEc0dHOxaBV9SP0FYWt20JWaiSg@mail.gmail.com>
Date:   Wed, 20 Nov 2019 08:34:00 +0800
From:   Wanpeng Li <kernellwp@...il.com>
To:     Liran Alon <liran.alon@...cle.com>
Cc:     Sean Christopherson <sean.j.christopherson@...el.com>,
        LKML <linux-kernel@...r.kernel.org>, kvm <kvm@...r.kernel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>
Subject: Re: [PATCH v2 1/2] KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath

On Wed, 20 Nov 2019 at 02:58, Liran Alon <liran.alon@...cle.com> wrote:
>
>
>
> > On 19 Nov 2019, at 20:36, Sean Christopherson <sean.j.christopherson@...el.com> wrote:
> >
> > On Tue, Nov 19, 2019 at 02:36:28PM +0800, Wanpeng Li wrote:
> >> From: Wanpeng Li <wanpengli@...cent.com>
> >>
> >> ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in
> >> our product observation, multicast IPIs are not as common as unicast
> >> IPI like RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc.
> >>
> >> This patch tries to optimize x2apic physical destination mode, fixed
> >> delivery mode single target IPI by delivering IPI to receiver as soon
> >> as possible after sender writes ICR vmexit to avoid various checks
> >> when possible, especially when running guest w/ --overcommit cpu-pm=on
> >> or guest can keep running, IPI can be injected to target vCPU by
> >> posted-interrupt immediately.
> >>
> >> Testing on Xeon Skylake server:
> >>
> >> The virtual IPI latency from sender send to receiver receive reduces
> >> more than 200+ cpu cycles.
> >>
> >> Signed-off-by: Wanpeng Li <wanpengli@...cent.com>
> >> ---
> >> v1 -> v2:
> >> * add tracepoint
> >> * Instead of a separate vcpu->fast_vmexit, set exit_reason
> >>   to vmx->exit_reason to -1 if the fast path succeeds.
> >> * move the "kvm_skip_emulated_instruction(vcpu)" to vmx_handle_exit
> >> * moving the handling into vmx_handle_exit_irqoff()
> >>
> >> arch/x86/include/asm/kvm_host.h |  4 ++--
> >> arch/x86/include/uapi/asm/vmx.h |  1 +
> >> arch/x86/kvm/svm.c              |  4 ++--
> >> arch/x86/kvm/vmx/vmx.c          | 40 +++++++++++++++++++++++++++++++++++++---
> >> arch/x86/kvm/x86.c              |  5 +++--
> >> 5 files changed, 45 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index 898ab9e..0daafa9 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -1084,7 +1084,7 @@ struct kvm_x86_ops {
> >>      void (*tlb_flush_gva)(struct kvm_vcpu *vcpu, gva_t addr);
> >>
> >>      void (*run)(struct kvm_vcpu *vcpu);
> >> -    int (*handle_exit)(struct kvm_vcpu *vcpu);
> >> +    int (*handle_exit)(struct kvm_vcpu *vcpu, u32 *vcpu_exit_reason);
> >>      int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
> >>      void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
> >>      u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu);
> >> @@ -1134,7 +1134,7 @@ struct kvm_x86_ops {
> >>      int (*check_intercept)(struct kvm_vcpu *vcpu,
> >>                             struct x86_instruction_info *info,
> >>                             enum x86_intercept_stage stage);
> >> -    void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu);
> >> +    void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu, u32 *vcpu_exit_reason);
> >>      bool (*mpx_supported)(void);
> >>      bool (*xsaves_supported)(void);
> >>      bool (*umip_emulated)(void);
> >> diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
> >> index 3eb8411..b33c6e1 100644
> >> --- a/arch/x86/include/uapi/asm/vmx.h
> >> +++ b/arch/x86/include/uapi/asm/vmx.h
> >> @@ -88,6 +88,7 @@
> >> #define EXIT_REASON_XRSTORS             64
> >> #define EXIT_REASON_UMWAIT              67
> >> #define EXIT_REASON_TPAUSE              68
> >> +#define EXIT_REASON_NEED_SKIP_EMULATED_INSN -1
> >>
> >> #define VMX_EXIT_REASONS \
> >>      { EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
> >
> > Rather than pass a custom exit reason around, can we simply handle *all*
> > x2apic ICR writes during handle_exit_irqoff() for both VMX and SVM?  The
> > only risk I can think of is that KVM could stall too long before enabling
> > IRQs.
> >
>
> I agree that if it doesn’t cause to run with interrupts disabled then this is a nicer approach.

In x2apic cluster mode, each cluster can contain up to 16 logical IDs,
at the worst case, target vCPUs should be woken up one by one, and the
function select_idle_sibling() in scheduler is a well-known cpu burn
searching logic, I'm afraid we will extend the interrupts disabled and
preemption off time too much.

> However, I think we may generalise a bit this patch to a clear code-path where accelerated exit handling
> should be put. See my other reply in this email thread and tell me what you think:
> https://www.spinics.net/lists/kernel/msg3322282.html

Thanks for the nicer code-path suggestion, I will try it. :)

    Wanpeng