linux-kernel - RE: [PATCH v4 08/21] kvm: x86: Check and enable permitted dynamic xfeatures at KVM_SET

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB5276220CE6BE21025797A7BC8C459@BN9PR11MB5276.namprd11.prod.outlook.com>
Date:   Thu, 30 Dec 2021 02:28:25 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     "Christopherson,, Sean" <seanjc@...gle.com>,
        "Zhong, Yang" <yang.zhong@...el.com>
CC:     "x86@...nel.org" <x86@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "corbet@....net" <corbet@....net>,
        "shuah@...nel.org" <shuah@...nel.org>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        "jing2.liu@...ux.intel.com" <jing2.liu@...ux.intel.com>,
        "Liu, Jing2" <jing2.liu@...el.com>,
        "Zeng, Guang" <guang.zeng@...el.com>,
        "Wang, Wei W" <wei.w.wang@...el.com>
Subject: RE: [PATCH v4 08/21] kvm: x86: Check and enable permitted dynamic
 xfeatures at KVM_SET_CPUID2

> From: Sean Christopherson <seanjc@...gle.com>
> Sent: Thursday, December 30, 2021 12:55 AM
> 
> On Wed, Dec 29, 2021, Yang Zhong wrote:
> > From: Jing Liu <jing2.liu@...el.com>
> >
> > Guest xstate permissions should be set by userspace VMM before vcpu
> > creation. Extend KVM_SET_CPUID2 to verify that every feature reported
> > in CPUID[0xD] has proper permission set. If permission allows, enable
> > all xfeatures in guest cpuid with fpstate buffer sized accordingly.
> >
> > This avoids introducing new KVM exit reason for reporting permission
> > violation to userspace VMM at run-time and also removes the need of
> > tricky fpstate buffer expansion in the emulation and restore path of
> > XCR0 and IA32_XFD MSR.
> 
> How so?  __do_cpuid_func() restricts what is advertised to userspace based
> on
> xstate_get_guest_group_perm(), so it's not like KVM is advertising something
> it
> can't provide?  There should never be any danger to KVM that's mitigated by
> restricing guest CPUID because KVM can and should check vcpu-
> >arch.guest_fpu.perm
> instead of guest CPUID.

Well, above explains why we choose to expand fpstate buffer at 
KVM_SET_CPUID2 instead of in the emulation path when required
permissions have been set, as discussed here:

https://lore.kernel.org/all/20211214024948.048572883@linutronix.de/

> 
> In other words, I believe you're conflating the overall approach of requiring
> userspace to pre-acquire the necessary permissions with enforcing what
> userspace
> advertises to the guest.
> 
> > Signed-off-by: Jing Liu <jing2.liu@...el.com>
> > Signed-off-by: Kevin Tian <kevin.tian@...el.com>
> > Signed-off-by: Yang Zhong <yang.zhong@...el.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 62 +++++++++++++++++++++++++++++++++----------
> -
> >  1 file changed, 47 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 4855344091b8..acbc10db550e 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -81,9 +81,12 @@ static inline struct kvm_cpuid_entry2
> *cpuid_entry2_find(
> >  	return NULL;
> >  }
> >
> > -static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
> > +static int kvm_check_cpuid(struct kvm_vcpu *vcpu,
> > +			   struct kvm_cpuid_entry2 *entries,
> > +			   int nent)
> >  {
> >  	struct kvm_cpuid_entry2 *best;
> > +	int r = 0;
> >
> >  	/*
> >  	 * The existing code assumes virtual address is 48-bit or 57-bit in the
> > @@ -93,11 +96,40 @@ static int kvm_check_cpuid(struct
> kvm_cpuid_entry2 *entries, int nent)
> >  	if (best) {
> >  		int vaddr_bits = (best->eax & 0xff00) >> 8;
> >
> > -		if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0)
> > -			return -EINVAL;
> > +		if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0) {
> > +			r = -EINVAL;
> > +			goto out;
> 
> Please don't change this to a goto, a return is perfectly ok and more readable
> as it doesn't imply there's some functional change that needs to be unwound
> at
> the end.

will fix

> 
> > +		}
> >  	}
> >
> > -	return 0;
> > +	/*
> > +	 * Check guest permissions for dynamically-enabled xfeatures.
> > +	 * Userspace VMM is expected to acquire permission before vCPU
> > +	 * creation. If permission allows, enable all xfeatures with
> > +	 * fpstate buffer sized accordingly. This avoids complexity of
> > +	 * run-time expansion in the emulation and restore path of XCR0
> > +	 * and IA32_XFD MSR.
> > +	 */
> > +	best = cpuid_entry2_find(entries, nent, 0xd, 0);
> > +	if (best) {
> > +		u64 xfeatures;
> > +
> > +		xfeatures = best->eax | ((u64)best->edx << 32);
> > +		if (xfeatures & ~vcpu->arch.guest_fpu.perm) {
> > +			r = -ENXIO;
> 
> ENXIO is a rather odd error code for insufficient permissions, especially since
> the FPU returns -EPERM for what is effectively the same check.
> 
> > +			goto out;
> > +		}
> > +
> > +		if (xfeatures != vcpu->arch.guest_fpu.xfeatures) {
> 
> xfeatures is obviously not consumed anywhere, which is super confusing and
> arguably wrong, e.g. if userspace advertises xfeatures that are a subset of
> vcpu->arch.guest_fpu.perm, this will expand XSAVE state beyond what
> userspace
> actually wants to advertise to the guest.  The really confusing case would be
> if
> userspace reduced xfeatures relative to vcpu->arch.guest_fpu.xfeatures and
> got
> an -ENOMEM due to the FPU failing to expand the XSAVE size.

You are right.

> 
> I don't care about the waste of memory, and IIUC userspace would have to
> intentionally request permissions for the guest that it then ignores, but that
> doesn't make the code any less confusing.  And as written, this check also
> prevents
> advertising non-XFD features that are not supported in hardware.  I doubt
> there's
> a production use case for that (though MPX deprecation comes close), but
> I've
> certainly exposed unsupported features to a guest for testing purposes.
> 
> Rather than bleed details from the FPU into KVM, why not have the FPU do
> any and
> all checks?  That also gives the FPU access to requested xfeatures so that it
> can opportunistically avoid unnecessary expansion.  We can also tweak the
> kernel
> APIs to be more particular about input values.

All above makes sense, especially when we combine permission check
and buffer expansion in one step now.

> 
> At that point, I would be ok with fpu_update_guest_perm_features()
> rejecting
> attempts to advertise features that are not permitted, because then it's an
> FPU
> policy, not a KVM policy, and there's a valid reason for said policy.  It's a bit
> of a pedantic distinction, but to me it matters because having KVM explicitly
> restrict guest CPUID implies that doing so is necessary for KVM correctness,
> which
> AFAICT is not the case.
> 
> E.g. in KVM
> 
> 	/*
> 	 * Exposing dynamic xfeatures to the guest requires additional
> enabling
> 	 * in the FPU, e.g. to expand the guest XSAVE state size.
> 	 */
> 	best = cpuid_entry2_find(entries, nent, 0xd, 0);
> 	if (!best)
> 		return 0;
> 
> 	xfeatures = best->eax | ((u64)best->edx << 32);
> 	xfeatures &= XFEATURE_MASK_USER_DYNAMIC;
> 	if (!xfeatures)
> 		return 0;
> 
> 	return fpu_enable_guest_xfd_features(&vcpu->arch.guest_fpu,
> xfeatures);
> 
> and then
> 
>   int fpu_enable_guest_xfd_features(struct fpu_guest *guest_fpu, u64
> xfeatures)
>   {
> 	lockdep_assert_preemption_enabled();
> 
> 	/* Nothing to do if all requested features are already enabled. */
> 	xfeatures &= ~guest_fpu->xfeatures;
> 	if (!xfeatures)
> 		return 0;
> 
> 	/* Dynamic xfeatures are not supported with 32-bit kernels. */
> 	if (!IS_ENABLED(CONFIG_X86_64))
> 		return -EPERM;
> 
> 	return __xfd_enable_feature(xfeatures, guest_fpu);
>   }
> 
> with
> 
>   int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu)
>   {
> 	struct fpu_state_perm *perm;
> 	unsigned int ksize, usize;
> 	struct fpu *fpu;
> 
> 	if (WARN_ON_ONCE(!xfd_err || (xfd_err &
> ~XFEATURE_MASK_USER_DYNAMIC)))
> 		return 0;

Currently this is done as:

int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu)
 {
 	u64 xfd_event = xfd_err & XFEATURE_MASK_USER_DYNAMIC;

	...
 	if (!xfd_event) {
		if (!guest_fpu)
			pr_err_once("XFD: Invalid xfd error: %016llx\n", xfd_err);
 		return 0;
 	}
	...
}

is it necessary to convert the error print to WARN_ON() (and also
apply to guest_fpu)?

> 
> 	...
>   }
> 
> which addresses several things:
> 
>   a) avoids explicitly restricing guest CPUID in KVM, and in particular doesn't
>      prevent userspace from advertising non-XFD features that aren't
> supported in
>      hardware, which for better or worse is allowed today.
> 
>   b) returns -EPERM instead of '0' when userspace attempts to enable
> dynamic
>      xfeatures with 32-bit kernels, which isn't a bug as posted only because
>      KVM pre-checks vcpu->arch.guest_fpu.perm.
> 
>   b) avoids reading guest_perm outside of siglock, which was technically a
> TOCTOU
>      "bug", though it didn't put the kernel at risk because
> __xstate_request_perm()
>      doesn't allow reducing permissions.
> 
>   c) allows __xfd_enable_feature() to require the caller to provide just XFD
>      features
> 

All the sample code looks good to me, except WARN_ON() introduced in
the last bullet.

If no objection from other maintainers, we can incorporate it in next version.

Thanks
Kevin