lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 26 Mar 2022 00:15:07 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Jon Kohler <jon@...anix.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: x86: optimize PKU branching in
 kvm_load_{guest|host}_xsave_state

On Wed, Mar 23, 2022, Jon Kohler wrote:
> kvm_load_{guest|host}_xsave_state handles xsave on vm entry and exit,
> part of which is managing memory protection key state. The latest
> arch.pkru is updated with a rdpkru, and if that doesn't match the base
> host_pkru (which about 70% of the time), we issue a __write_pkru.
> 
> To improve performance, implement the following optimizations:
>  1. Reorder if conditions prior to wrpkru in both
>     kvm_load_{guest|host}_xsave_state.
> 
>     Flip the ordering of the || condition so that XFEATURE_MASK_PKRU is
>     checked first, which when instrumented in our environment appeared
>     to be always true and less overall work than kvm_read_cr4_bits.

If it's always true, then it should be checked last, not first.  And if
kvm_read_cr4_bits() is more expensive then we should address that issue, not
try to micro-optimize this stuff.  X86_CR4_PKE can't be guest-owned, and so
kvm_read_cr4_bits() should be optimized down to:

	return vcpu->arch.cr4 & X86_CR4_PKE;

If the compiler isn't smart enough to do that on its own then we should rework
kvm_read_cr4_bits() as that will benefit multiple code paths.

>     For kvm_load_guest_xsave_state, hoist arch.pkru != host_pkru ahead
>     one position. When instrumented, I saw this be true roughly ~70% of
>     the time vs the other conditions which were almost always true.
>     With this change, we will avoid 3rd condition check ~30% of the time.

If the guest uses PKRU...  If PKRU is used by the host but not the guest, the
early comparison is pure overhead because it will almost always be true (guest
will be zero, host will non-zero), 

>  2. Wrap PKU sections with CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS,
>     as if the user compiles out this feature, we should not have
>     these branches at all.

Not that it really matters, since static_cpu_has() will patch out all the branches,
and in practice who cares about a JMP or NOP(s)?  But...

#ifdeffery is the wrong way to handle this.  Replace static_cpu_has() with
cpu_feature_enabled(); that'll boil down to a '0' and omit all the code when
CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=n, without the #ifdef ugliness.

> Signed-off-by: Jon Kohler <jon@...anix.com>
> ---
>  arch/x86/kvm/x86.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6db3a506b402..2b00123a5d50 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -950,11 +950,13 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
>  			wrmsrl(MSR_IA32_XSS, vcpu->arch.ia32_xss);
>  	}
> 
> +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
>  	if (static_cpu_has(X86_FEATURE_PKU) &&
> -	    (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) ||
> -	     (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU)) &&
> -	    vcpu->arch.pkru != vcpu->arch.host_pkru)
> +	    vcpu->arch.pkru != vcpu->arch.host_pkru &&
> +	    ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) ||
> +	     kvm_read_cr4_bits(vcpu, X86_CR4_PKE)))
>  		write_pkru(vcpu->arch.pkru);
> +#endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
>  }
>  EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state);
> 
> @@ -963,13 +965,15 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
>  	if (vcpu->arch.guest_state_protected)
>  		return;
> 
> +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
>  	if (static_cpu_has(X86_FEATURE_PKU) &&
> -	    (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) ||
> -	     (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU))) {
> +	    ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) ||
> +	     kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) {
>  		vcpu->arch.pkru = rdpkru();
>  		if (vcpu->arch.pkru != vcpu->arch.host_pkru)
>  			write_pkru(vcpu->arch.host_pkru);
>  	}
> +#endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
> 
>  	if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) {
> 
> --
> 2.30.1 (Apple Git-130)
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ