linux-kernel - Re: [PATCH 38/61] KVM: x86: Introduce kvm_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200224225743.GP29865@linux.intel.com>
Date:   Mon, 24 Feb 2020 14:57:43 -0800
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Vitaly Kuznetsov <vkuznets@...hat.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 38/61] KVM: x86: Introduce kvm_cpu_caps to replace
 runtime CPUID masking

On Mon, Feb 24, 2020 at 05:32:54PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@...el.com> writes:
> 
> > Calculate the CPUID masks for KVM_GET_SUPPORTED_CPUID at load time using
> > what is effectively a KVM-adjusted copy of boot_cpu_data, or more
> > precisely, the x86_capability array in boot_cpu_data.
> >
> > In terms of KVM support, the vast majority of CPUID feature bits are
> > constant, and *all* feature support is known at KVM load time.  Rather
> > than apply boot_cpu_data, which is effectively read-only after init,
> > at runtime, copy it into a KVM-specific array and use *that* to mask
> > CPUID registers.
> >
> > In additional to consolidating the masking, kvm_cpu_caps can be adjusted
> > by SVM/VMX at load time and thus eliminate all feature bit manipulation
> > in ->set_supported_cpuid().
> >
> > No functional change intended.
> >
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@...el.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 229 +++++++++++++++++++++++--------------------
> >  arch/x86/kvm/cpuid.h |  19 ++++
> >  arch/x86/kvm/x86.c   |   2 +
> >  3 files changed, 142 insertions(+), 108 deletions(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 20a7af320291..c2a4c9df49a9 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -24,6 +24,13 @@
> >  #include "trace.h"
> >  #include "pmu.h"
> >  
> > +/*
> > + * Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be
> > + * aligned to sizeof(unsigned long) because it's not accessed via bitops.
> > + */
> > +u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
> > +EXPORT_SYMBOL_GPL(kvm_cpu_caps);
> > +
> >  static u32 xstate_required_size(u64 xstate_bv, bool compacted)
> >  {
> >  	int feature_bit = 0;
> > @@ -259,7 +266,119 @@ static __always_inline void cpuid_entry_mask(struct kvm_cpuid_entry2 *entry,
> >  {
> >  	u32 *reg = cpuid_entry_get_reg(entry, leaf * 32);
> >  
> > -	*reg &= boot_cpu_data.x86_capability[leaf];
> > +	BUILD_BUG_ON(leaf > ARRAY_SIZE(kvm_cpu_caps));
> 
> Should this be '>=' ?

Yep, nice catch.

> > +	*reg &= kvm_cpu_caps[leaf];
> > +}
> > +
> > +static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
> > +{
> > +	reverse_cpuid_check(leaf);
> > +	kvm_cpu_caps[leaf] &= mask;
> > +}
> > +
> > +void kvm_set_cpu_caps(void)
> > +{
> > +	unsigned f_nx = is_efer_nx() ? F(NX) : 0;
> > +#ifdef CONFIG_X86_64
> > +	unsigned f_gbpages = F(GBPAGES);
> > +	unsigned f_lm = F(LM);
> > +#else
> > +	unsigned f_gbpages = 0;
> > +	unsigned f_lm = 0;
> > +#endif
> 
> Three too many bare 'unsinged's :-)

Roger that, I'll fix this up.

> > +
> > +	BUILD_BUG_ON(sizeof(kvm_cpu_caps) >
> > +		     sizeof(boot_cpu_data.x86_capability));
> > +
> > +	memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
> > +	       sizeof(kvm_cpu_caps));
> > +
> > +	kvm_cpu_cap_mask(CPUID_1_EDX,
> > +		F(FPU) | F(VME) | F(DE) | F(PSE) |
> > +		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
> > +		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) |
> > +		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
> > +		F(PAT) | F(PSE36) | 0 /* PSN */ | F(CLFLUSH) |
> > +		0 /* Reserved, DS, ACPI */ | F(MMX) |
> > +		F(FXSR) | F(XMM) | F(XMM2) | F(SELFSNOOP) |
> > +		0 /* HTT, TM, Reserved, PBE */
> > +	);
> > +
> > +	kvm_cpu_cap_mask(CPUID_8000_0001_EDX,
> > +		F(FPU) | F(VME) | F(DE) | F(PSE) |
> > +		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
> > +		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> > +		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
> > +		F(PAT) | F(PSE36) | 0 /* Reserved */ |
> > +		f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
> > +		F(FXSR) | F(FXSR_OPT) | f_gbpages | F(RDTSCP) |
> > +		0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW)
> > +	);
> > +
> > +	kvm_cpu_cap_mask(CPUID_1_ECX,
> > +		/* NOTE: MONITOR (and MWAIT) are emulated as NOP,
> > +		 * but *not* advertised to guests via CPUID ! */
> > +		F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
> > +		0 /* DS-CPL, VMX, SMX, EST */ |
> > +		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
> > +		F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
> > +		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
> > +		F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
> > +		0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
> > +		F(F16C) | F(RDRAND)
> > +	);
> 
> I would suggest we order things by CPUID_NUM here, i.e.
> 
> CPUID_1_ECX
> CPUID_1_EDX
> CPUID_7_1_EAX
> CPUID_7_0_EBX
> CPUID_7_ECX
> CPUID_7_EDX
> CPUID_D_1_EAX
> ...

Hmm, generally speaking I agree, but I didn't want to change the ordering
in this patch when moving the code.  Throw a patch on top?  Leave as is?
Something else?