lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 20 Nov 2019 08:34:00 +0800
From:   Wanpeng Li <kernellwp@...il.com>
To:     Liran Alon <liran.alon@...cle.com>
Cc:     Sean Christopherson <sean.j.christopherson@...el.com>,
        LKML <linux-kernel@...r.kernel.org>, kvm <kvm@...r.kernel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>
Subject: Re: [PATCH v2 1/2] KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath

On Wed, 20 Nov 2019 at 02:58, Liran Alon <liran.alon@...cle.com> wrote:
>
>
>
> > On 19 Nov 2019, at 20:36, Sean Christopherson <sean.j.christopherson@...el.com> wrote:
> >
> > On Tue, Nov 19, 2019 at 02:36:28PM +0800, Wanpeng Li wrote:
> >> From: Wanpeng Li <wanpengli@...cent.com>
> >>
> >> ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in
> >> our product observation, multicast IPIs are not as common as unicast
> >> IPI like RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc.
> >>
> >> This patch tries to optimize x2apic physical destination mode, fixed
> >> delivery mode single target IPI by delivering IPI to receiver as soon
> >> as possible after sender writes ICR vmexit to avoid various checks
> >> when possible, especially when running guest w/ --overcommit cpu-pm=on
> >> or guest can keep running, IPI can be injected to target vCPU by
> >> posted-interrupt immediately.
> >>
> >> Testing on Xeon Skylake server:
> >>
> >> The virtual IPI latency from sender send to receiver receive reduces
> >> more than 200+ cpu cycles.
> >>
> >> Signed-off-by: Wanpeng Li <wanpengli@...cent.com>
> >> ---
> >> v1 -> v2:
> >> * add tracepoint
> >> * Instead of a separate vcpu->fast_vmexit, set exit_reason
> >>   to vmx->exit_reason to -1 if the fast path succeeds.
> >> * move the "kvm_skip_emulated_instruction(vcpu)" to vmx_handle_exit
> >> * moving the handling into vmx_handle_exit_irqoff()
> >>
> >> arch/x86/include/asm/kvm_host.h |  4 ++--
> >> arch/x86/include/uapi/asm/vmx.h |  1 +
> >> arch/x86/kvm/svm.c              |  4 ++--
> >> arch/x86/kvm/vmx/vmx.c          | 40 +++++++++++++++++++++++++++++++++++++---
> >> arch/x86/kvm/x86.c              |  5 +++--
> >> 5 files changed, 45 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index 898ab9e..0daafa9 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -1084,7 +1084,7 @@ struct kvm_x86_ops {
> >>      void (*tlb_flush_gva)(struct kvm_vcpu *vcpu, gva_t addr);
> >>
> >>      void (*run)(struct kvm_vcpu *vcpu);
> >> -    int (*handle_exit)(struct kvm_vcpu *vcpu);
> >> +    int (*handle_exit)(struct kvm_vcpu *vcpu, u32 *vcpu_exit_reason);
> >>      int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
> >>      void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
> >>      u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu);
> >> @@ -1134,7 +1134,7 @@ struct kvm_x86_ops {
> >>      int (*check_intercept)(struct kvm_vcpu *vcpu,
> >>                             struct x86_instruction_info *info,
> >>                             enum x86_intercept_stage stage);
> >> -    void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu);
> >> +    void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu, u32 *vcpu_exit_reason);
> >>      bool (*mpx_supported)(void);
> >>      bool (*xsaves_supported)(void);
> >>      bool (*umip_emulated)(void);
> >> diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
> >> index 3eb8411..b33c6e1 100644
> >> --- a/arch/x86/include/uapi/asm/vmx.h
> >> +++ b/arch/x86/include/uapi/asm/vmx.h
> >> @@ -88,6 +88,7 @@
> >> #define EXIT_REASON_XRSTORS             64
> >> #define EXIT_REASON_UMWAIT              67
> >> #define EXIT_REASON_TPAUSE              68
> >> +#define EXIT_REASON_NEED_SKIP_EMULATED_INSN -1
> >>
> >> #define VMX_EXIT_REASONS \
> >>      { EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
> >
> > Rather than pass a custom exit reason around, can we simply handle *all*
> > x2apic ICR writes during handle_exit_irqoff() for both VMX and SVM?  The
> > only risk I can think of is that KVM could stall too long before enabling
> > IRQs.
> >
>
> I agree that if it doesn’t cause to run with interrupts disabled then this is a nicer approach.

In x2apic cluster mode, each cluster can contain up to 16 logical IDs,
at the worst case, target vCPUs should be woken up one by one, and the
function select_idle_sibling() in scheduler is a well-known cpu burn
searching logic, I'm afraid we will extend the interrupts disabled and
preemption off time too much.

> However, I think we may generalise a bit this patch to a clear code-path where accelerated exit handling
> should be put. See my other reply in this email thread and tell me what you think:
> https://www.spinics.net/lists/kernel/msg3322282.html

Thanks for the nicer code-path suggestion, I will try it. :)

    Wanpeng

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ