[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ec0a6376-9879-2277-9943-0d73339aadbb@amd.com>
Date: Thu, 7 Jan 2021 09:39:54 -0600
From: Tom Lendacky <thomas.lendacky@....com>
To: Michael Roth <michael.roth@....com>, kvm@...r.kernel.org
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Sean Christopherson <seanjc@...gle.com>,
Andy Lutomirski <luto@...capital.net>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
x86@...nel.org, "H . Peter Anvin" <hpa@...or.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/3] KVM: SVM: use vmsave/vmload for saving/restoring
additional host state
On 1/5/21 8:37 AM, Michael Roth wrote:
> Using a guest workload which simply issues 'hlt' in a tight loop to
> generate VMEXITs, it was observed (on a recent EPYC processor) that a
> significant amount of the VMEXIT overhead measured on the host was the
> result of MSR reads/writes in svm_vcpu_load/svm_vcpu_put according to
> perf:
>
> 67.49%--kvm_arch_vcpu_ioctl_run
> |
> |--23.13%--vcpu_put
> | kvm_arch_vcpu_put
> | |
> | |--21.31%--native_write_msr
> | |
> | --1.27%--svm_set_cr4
> |
> |--16.11%--vcpu_load
> | |
> | --15.58%--kvm_arch_vcpu_load
> | |
> | |--13.97%--svm_set_cr4
> | | |
> | | |--12.64%--native_read_msr
>
> Most of these MSRs relate to 'syscall'/'sysenter' and segment bases, and
> can be saved/restored using 'vmsave'/'vmload' instructions rather than
> explicit MSR reads/writes. In doing so there is a significant reduction
> in the svm_vcpu_load/svm_vcpu_put overhead measured for the above
> workload:
>
> 50.92%--kvm_arch_vcpu_ioctl_run
> |
> |--19.28%--disable_nmi_singlestep
> |
> |--13.68%--vcpu_load
> | kvm_arch_vcpu_load
> | |
> | |--9.19%--svm_set_cr4
> | | |
> | | --6.44%--native_read_msr
> | |
> | --3.55%--native_write_msr
> |
> |--6.05%--kvm_inject_nmi
> |--2.80%--kvm_sev_es_mmio_read
> |--2.19%--vcpu_put
> | |
> | --1.25%--kvm_arch_vcpu_put
> | native_write_msr
>
> Quantifying this further, if we look at the raw cycle counts for a
> normal iteration of the above workload (according to 'rdtscp'),
> kvm_arch_vcpu_ioctl_run() takes ~4600 cycles from start to finish with
> the current behavior. Using 'vmsave'/'vmload', this is reduced to
> ~2800 cycles, a savings of 39%.
>
> While this approach doesn't seem to manifest in any noticeable
> improvement for more realistic workloads like UnixBench, netperf, and
> kernel builds, likely due to their exit paths generally involving IO
> with comparatively high latencies, it does improve overall overhead
> of KVM_RUN significantly, which may still be noticeable for certain
> situations. It also simplifies some aspects of the code.
>
> With this change, explicit save/restore is no longer needed for the
> following host MSRs, since they are documented[1] as being part of the
> VMCB State Save Area:
>
> MSR_STAR, MSR_LSTAR, MSR_CSTAR,
> MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE,
> MSR_IA32_SYSENTER_CS,
> MSR_IA32_SYSENTER_ESP,
> MSR_IA32_SYSENTER_EIP,
> MSR_FS_BASE, MSR_GS_BASE
>
> and only the following MSR needs individual handling in
> svm_vcpu_put/svm_vcpu_load:
>
> MSR_TSC_AUX
>
> We could drop the host_save_user_msrs array/loop and instead handle
> MSR read/write of MSR_TSC_AUX directly, but we leave that for now as
> a potential follow-up.
>
> Since 'vmsave'/'vmload' also handles the LDTR and FS/GS segment
> registers (and associated hidden state)[2], some of the code
> previously used to handle this is no longer needed, so we drop it
> as well.
>
> The first public release of the SVM spec[3] also documents the same
> handling for the host state in question, so we make these changes
> unconditionally.
>
> Also worth noting is that we 'vmsave' to the same page that is
> subsequently used by 'vmrun' to record some host additional state. This
> is okay, since, in accordance with the spec[2], the additional state
> written to the page by 'vmrun' does not overwrite any fields written by
> 'vmsave'. This has also been confirmed through testing (for the above
> CPU, at least).
>
> [1] AMD64 Architecture Programmer's Manual, Rev 3.33, Volume 2, Appendix B, Table B-2
> [2] AMD64 Architecture Programmer's Manual, Rev 3.31, Volume 3, Chapter 4, VMSAVE/VMLOAD
> [3] Secure Virtual Machine Architecture Reference Manual, Rev 3.01
>
> Suggested-by: Tom Lendacky <thomas.lendacky@....com>
> Signed-off-by: Michael Roth <michael.roth@....com>
> ---
> arch/x86/kvm/svm/svm.c | 36 +++++++-----------------------------
> arch/x86/kvm/svm/svm.h | 19 +------------------
> arch/x86/kvm/svm/vmenter.S | 10 ++++++++++
> 3 files changed, 18 insertions(+), 47 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 941e5251e13f..7a7e9b7d47a7 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1420,16 +1420,12 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> if (sev_es_guest(svm->vcpu.kvm)) {
> sev_es_vcpu_load(svm, cpu);
> } else {
> -#ifdef CONFIG_X86_64
> - rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base);
> -#endif
> - savesegment(fs, svm->host.fs);
> - savesegment(gs, svm->host.gs);
> - svm->host.ldt = kvm_read_ldt();
> -
> for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
> rdmsrl(host_save_user_msrs[i].index,
> svm->host_user_msrs[i]);
> +
> + asm volatile(__ex("vmsave %%"_ASM_AX)
> + : : "a" (page_to_phys(sd->save_area)) : "memory");
> }
>
> if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
> @@ -1461,17 +1457,6 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
> if (sev_es_guest(svm->vcpu.kvm)) {
> sev_es_vcpu_put(svm);
> } else {
> - kvm_load_ldt(svm->host.ldt);
> -#ifdef CONFIG_X86_64
> - loadsegment(fs, svm->host.fs);
> - wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> - load_gs_index(svm->host.gs);
> -#else
> -#ifdef CONFIG_X86_32_LAZY_GS
> - loadsegment(gs, svm->host.gs);
> -#endif
> -#endif
> -
> for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
> wrmsrl(host_save_user_msrs[i].index,
> svm->host_user_msrs[i]);
> @@ -3675,7 +3660,7 @@ static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
> return EXIT_FASTPATH_NONE;
> }
>
> -void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs);
> +void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs, unsigned long hostsa_pa);
There was a follow on fix patch to remove this forward declaration since,
for SEV-ES, I had moved it into svm.h without deleting it here. I'm not
sure when it will hit Paolo's tree.
Thanks,
Tom
>
> static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> struct vcpu_svm *svm)
> @@ -3703,16 +3688,9 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> if (sev_es_guest(svm->vcpu.kvm)) {
> __svm_sev_es_vcpu_run(svm->vmcb_pa);
> } else {
> - __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
> -
> -#ifdef CONFIG_X86_64
> - native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
> -#else
> - loadsegment(fs, svm->host.fs);
> -#ifndef CONFIG_X86_32_LAZY_GS
> - loadsegment(gs, svm->host.gs);
> -#endif
> -#endif
> + __svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs,
> + page_to_phys(per_cpu(svm_data,
> + vcpu->cpu)->save_area));
> }
>
> /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 5431e6335e2e..1f4460508036 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -27,17 +27,6 @@ static const struct svm_host_save_msrs {
> u32 index; /* Index of the MSR */
> bool sev_es_restored; /* True if MSR is restored on SEV-ES VMEXIT */
> } host_save_user_msrs[] = {
> -#ifdef CONFIG_X86_64
> - { .index = MSR_STAR, .sev_es_restored = true },
> - { .index = MSR_LSTAR, .sev_es_restored = true },
> - { .index = MSR_CSTAR, .sev_es_restored = true },
> - { .index = MSR_SYSCALL_MASK, .sev_es_restored = true },
> - { .index = MSR_KERNEL_GS_BASE, .sev_es_restored = true },
> - { .index = MSR_FS_BASE, .sev_es_restored = true },
> -#endif
> - { .index = MSR_IA32_SYSENTER_CS, .sev_es_restored = true },
> - { .index = MSR_IA32_SYSENTER_ESP, .sev_es_restored = true },
> - { .index = MSR_IA32_SYSENTER_EIP, .sev_es_restored = true },
> { .index = MSR_TSC_AUX, .sev_es_restored = false },
> };
> #define NR_HOST_SAVE_USER_MSRS ARRAY_SIZE(host_save_user_msrs)
> @@ -130,12 +119,6 @@ struct vcpu_svm {
> u64 next_rip;
>
> u64 host_user_msrs[NR_HOST_SAVE_USER_MSRS];
> - struct {
> - u16 fs;
> - u16 gs;
> - u16 ldt;
> - u64 gs_base;
> - } host;
>
> u64 spec_ctrl;
> /*
> @@ -595,6 +578,6 @@ void sev_es_vcpu_put(struct vcpu_svm *svm);
> /* vmenter.S */
>
> void __svm_sev_es_vcpu_run(unsigned long vmcb_pa);
> -void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs);
> +void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs, unsigned long hostsa_pa);
>
> #endif
> diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
> index 6feb8c08f45a..89f4e8e7bf0e 100644
> --- a/arch/x86/kvm/svm/vmenter.S
> +++ b/arch/x86/kvm/svm/vmenter.S
> @@ -33,6 +33,7 @@
> * __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode
> * @vmcb_pa: unsigned long
> * @regs: unsigned long * (to guest registers)
> + * @hostsa_pa: unsigned long
> */
> SYM_FUNC_START(__svm_vcpu_run)
> push %_ASM_BP
> @@ -47,6 +48,9 @@ SYM_FUNC_START(__svm_vcpu_run)
> #endif
> push %_ASM_BX
>
> + /* Save @hostsa_pa */
> + push %_ASM_ARG3
> +
> /* Save @regs. */
> push %_ASM_ARG2
>
> @@ -154,6 +158,12 @@ SYM_FUNC_START(__svm_vcpu_run)
> xor %r15d, %r15d
> #endif
>
> + /* "POP" @hostsa_pa to RAX. */
> + pop %_ASM_AX
> +
> + /* Restore host user state and FS/GS base */
> + vmload %_ASM_AX
> +
> pop %_ASM_BX
>
> #ifdef CONFIG_X86_64
>
Powered by blists - more mailing lists