[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yj5bCw0q5n4ZgSuU@google.com>
Date: Sat, 26 Mar 2022 00:15:07 +0000
From: Sean Christopherson <seanjc@...gle.com>
To: Jon Kohler <jon@...anix.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: x86: optimize PKU branching in
kvm_load_{guest|host}_xsave_state
On Wed, Mar 23, 2022, Jon Kohler wrote:
> kvm_load_{guest|host}_xsave_state handles xsave on vm entry and exit,
> part of which is managing memory protection key state. The latest
> arch.pkru is updated with a rdpkru, and if that doesn't match the base
> host_pkru (which about 70% of the time), we issue a __write_pkru.
>
> To improve performance, implement the following optimizations:
> 1. Reorder if conditions prior to wrpkru in both
> kvm_load_{guest|host}_xsave_state.
>
> Flip the ordering of the || condition so that XFEATURE_MASK_PKRU is
> checked first, which when instrumented in our environment appeared
> to be always true and less overall work than kvm_read_cr4_bits.
If it's always true, then it should be checked last, not first. And if
kvm_read_cr4_bits() is more expensive then we should address that issue, not
try to micro-optimize this stuff. X86_CR4_PKE can't be guest-owned, and so
kvm_read_cr4_bits() should be optimized down to:
return vcpu->arch.cr4 & X86_CR4_PKE;
If the compiler isn't smart enough to do that on its own then we should rework
kvm_read_cr4_bits() as that will benefit multiple code paths.
> For kvm_load_guest_xsave_state, hoist arch.pkru != host_pkru ahead
> one position. When instrumented, I saw this be true roughly ~70% of
> the time vs the other conditions which were almost always true.
> With this change, we will avoid 3rd condition check ~30% of the time.
If the guest uses PKRU... If PKRU is used by the host but not the guest, the
early comparison is pure overhead because it will almost always be true (guest
will be zero, host will non-zero),
> 2. Wrap PKU sections with CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS,
> as if the user compiles out this feature, we should not have
> these branches at all.
Not that it really matters, since static_cpu_has() will patch out all the branches,
and in practice who cares about a JMP or NOP(s)? But...
#ifdeffery is the wrong way to handle this. Replace static_cpu_has() with
cpu_feature_enabled(); that'll boil down to a '0' and omit all the code when
CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=n, without the #ifdef ugliness.
> Signed-off-by: Jon Kohler <jon@...anix.com>
> ---
> arch/x86/kvm/x86.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6db3a506b402..2b00123a5d50 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -950,11 +950,13 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
> wrmsrl(MSR_IA32_XSS, vcpu->arch.ia32_xss);
> }
>
> +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> if (static_cpu_has(X86_FEATURE_PKU) &&
> - (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) ||
> - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU)) &&
> - vcpu->arch.pkru != vcpu->arch.host_pkru)
> + vcpu->arch.pkru != vcpu->arch.host_pkru &&
> + ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) ||
> + kvm_read_cr4_bits(vcpu, X86_CR4_PKE)))
> write_pkru(vcpu->arch.pkru);
> +#endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
> }
> EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state);
>
> @@ -963,13 +965,15 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
> if (vcpu->arch.guest_state_protected)
> return;
>
> +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> if (static_cpu_has(X86_FEATURE_PKU) &&
> - (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) ||
> - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU))) {
> + ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) ||
> + kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) {
> vcpu->arch.pkru = rdpkru();
> if (vcpu->arch.pkru != vcpu->arch.host_pkru)
> write_pkru(vcpu->arch.host_pkru);
> }
> +#endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
>
> if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) {
>
> --
> 2.30.1 (Apple Git-130)
>
Powered by blists - more mailing lists