[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB5276220CE6BE21025797A7BC8C459@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Thu, 30 Dec 2021 02:28:25 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: "Christopherson,, Sean" <seanjc@...gle.com>,
"Zhong, Yang" <yang.zhong@...el.com>
CC: "x86@...nel.org" <x86@...nel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"corbet@....net" <corbet@....net>,
"shuah@...nel.org" <shuah@...nel.org>,
"Nakajima, Jun" <jun.nakajima@...el.com>,
"jing2.liu@...ux.intel.com" <jing2.liu@...ux.intel.com>,
"Liu, Jing2" <jing2.liu@...el.com>,
"Zeng, Guang" <guang.zeng@...el.com>,
"Wang, Wei W" <wei.w.wang@...el.com>
Subject: RE: [PATCH v4 08/21] kvm: x86: Check and enable permitted dynamic
xfeatures at KVM_SET_CPUID2
> From: Sean Christopherson <seanjc@...gle.com>
> Sent: Thursday, December 30, 2021 12:55 AM
>
> On Wed, Dec 29, 2021, Yang Zhong wrote:
> > From: Jing Liu <jing2.liu@...el.com>
> >
> > Guest xstate permissions should be set by userspace VMM before vcpu
> > creation. Extend KVM_SET_CPUID2 to verify that every feature reported
> > in CPUID[0xD] has proper permission set. If permission allows, enable
> > all xfeatures in guest cpuid with fpstate buffer sized accordingly.
> >
> > This avoids introducing new KVM exit reason for reporting permission
> > violation to userspace VMM at run-time and also removes the need of
> > tricky fpstate buffer expansion in the emulation and restore path of
> > XCR0 and IA32_XFD MSR.
>
> How so? __do_cpuid_func() restricts what is advertised to userspace based
> on
> xstate_get_guest_group_perm(), so it's not like KVM is advertising something
> it
> can't provide? There should never be any danger to KVM that's mitigated by
> restricing guest CPUID because KVM can and should check vcpu-
> >arch.guest_fpu.perm
> instead of guest CPUID.
Well, above explains why we choose to expand fpstate buffer at
KVM_SET_CPUID2 instead of in the emulation path when required
permissions have been set, as discussed here:
https://lore.kernel.org/all/20211214024948.048572883@linutronix.de/
>
> In other words, I believe you're conflating the overall approach of requiring
> userspace to pre-acquire the necessary permissions with enforcing what
> userspace
> advertises to the guest.
>
> > Signed-off-by: Jing Liu <jing2.liu@...el.com>
> > Signed-off-by: Kevin Tian <kevin.tian@...el.com>
> > Signed-off-by: Yang Zhong <yang.zhong@...el.com>
> > ---
> > arch/x86/kvm/cpuid.c | 62 +++++++++++++++++++++++++++++++++----------
> -
> > 1 file changed, 47 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 4855344091b8..acbc10db550e 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -81,9 +81,12 @@ static inline struct kvm_cpuid_entry2
> *cpuid_entry2_find(
> > return NULL;
> > }
> >
> > -static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
> > +static int kvm_check_cpuid(struct kvm_vcpu *vcpu,
> > + struct kvm_cpuid_entry2 *entries,
> > + int nent)
> > {
> > struct kvm_cpuid_entry2 *best;
> > + int r = 0;
> >
> > /*
> > * The existing code assumes virtual address is 48-bit or 57-bit in the
> > @@ -93,11 +96,40 @@ static int kvm_check_cpuid(struct
> kvm_cpuid_entry2 *entries, int nent)
> > if (best) {
> > int vaddr_bits = (best->eax & 0xff00) >> 8;
> >
> > - if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0)
> > - return -EINVAL;
> > + if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0) {
> > + r = -EINVAL;
> > + goto out;
>
> Please don't change this to a goto, a return is perfectly ok and more readable
> as it doesn't imply there's some functional change that needs to be unwound
> at
> the end.
will fix
>
> > + }
> > }
> >
> > - return 0;
> > + /*
> > + * Check guest permissions for dynamically-enabled xfeatures.
> > + * Userspace VMM is expected to acquire permission before vCPU
> > + * creation. If permission allows, enable all xfeatures with
> > + * fpstate buffer sized accordingly. This avoids complexity of
> > + * run-time expansion in the emulation and restore path of XCR0
> > + * and IA32_XFD MSR.
> > + */
> > + best = cpuid_entry2_find(entries, nent, 0xd, 0);
> > + if (best) {
> > + u64 xfeatures;
> > +
> > + xfeatures = best->eax | ((u64)best->edx << 32);
> > + if (xfeatures & ~vcpu->arch.guest_fpu.perm) {
> > + r = -ENXIO;
>
> ENXIO is a rather odd error code for insufficient permissions, especially since
> the FPU returns -EPERM for what is effectively the same check.
>
> > + goto out;
> > + }
> > +
> > + if (xfeatures != vcpu->arch.guest_fpu.xfeatures) {
>
> xfeatures is obviously not consumed anywhere, which is super confusing and
> arguably wrong, e.g. if userspace advertises xfeatures that are a subset of
> vcpu->arch.guest_fpu.perm, this will expand XSAVE state beyond what
> userspace
> actually wants to advertise to the guest. The really confusing case would be
> if
> userspace reduced xfeatures relative to vcpu->arch.guest_fpu.xfeatures and
> got
> an -ENOMEM due to the FPU failing to expand the XSAVE size.
You are right.
>
> I don't care about the waste of memory, and IIUC userspace would have to
> intentionally request permissions for the guest that it then ignores, but that
> doesn't make the code any less confusing. And as written, this check also
> prevents
> advertising non-XFD features that are not supported in hardware. I doubt
> there's
> a production use case for that (though MPX deprecation comes close), but
> I've
> certainly exposed unsupported features to a guest for testing purposes.
>
> Rather than bleed details from the FPU into KVM, why not have the FPU do
> any and
> all checks? That also gives the FPU access to requested xfeatures so that it
> can opportunistically avoid unnecessary expansion. We can also tweak the
> kernel
> APIs to be more particular about input values.
All above makes sense, especially when we combine permission check
and buffer expansion in one step now.
>
> At that point, I would be ok with fpu_update_guest_perm_features()
> rejecting
> attempts to advertise features that are not permitted, because then it's an
> FPU
> policy, not a KVM policy, and there's a valid reason for said policy. It's a bit
> of a pedantic distinction, but to me it matters because having KVM explicitly
> restrict guest CPUID implies that doing so is necessary for KVM correctness,
> which
> AFAICT is not the case.
>
> E.g. in KVM
>
> /*
> * Exposing dynamic xfeatures to the guest requires additional
> enabling
> * in the FPU, e.g. to expand the guest XSAVE state size.
> */
> best = cpuid_entry2_find(entries, nent, 0xd, 0);
> if (!best)
> return 0;
>
> xfeatures = best->eax | ((u64)best->edx << 32);
> xfeatures &= XFEATURE_MASK_USER_DYNAMIC;
> if (!xfeatures)
> return 0;
>
> return fpu_enable_guest_xfd_features(&vcpu->arch.guest_fpu,
> xfeatures);
>
> and then
>
> int fpu_enable_guest_xfd_features(struct fpu_guest *guest_fpu, u64
> xfeatures)
> {
> lockdep_assert_preemption_enabled();
>
> /* Nothing to do if all requested features are already enabled. */
> xfeatures &= ~guest_fpu->xfeatures;
> if (!xfeatures)
> return 0;
>
> /* Dynamic xfeatures are not supported with 32-bit kernels. */
> if (!IS_ENABLED(CONFIG_X86_64))
> return -EPERM;
>
> return __xfd_enable_feature(xfeatures, guest_fpu);
> }
>
> with
>
> int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu)
> {
> struct fpu_state_perm *perm;
> unsigned int ksize, usize;
> struct fpu *fpu;
>
> if (WARN_ON_ONCE(!xfd_err || (xfd_err &
> ~XFEATURE_MASK_USER_DYNAMIC)))
> return 0;
Currently this is done as:
int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu)
{
u64 xfd_event = xfd_err & XFEATURE_MASK_USER_DYNAMIC;
...
if (!xfd_event) {
if (!guest_fpu)
pr_err_once("XFD: Invalid xfd error: %016llx\n", xfd_err);
return 0;
}
...
}
is it necessary to convert the error print to WARN_ON() (and also
apply to guest_fpu)?
>
> ...
> }
>
> which addresses several things:
>
> a) avoids explicitly restricing guest CPUID in KVM, and in particular doesn't
> prevent userspace from advertising non-XFD features that aren't
> supported in
> hardware, which for better or worse is allowed today.
>
> b) returns -EPERM instead of '0' when userspace attempts to enable
> dynamic
> xfeatures with 32-bit kernels, which isn't a bug as posted only because
> KVM pre-checks vcpu->arch.guest_fpu.perm.
>
> b) avoids reading guest_perm outside of siglock, which was technically a
> TOCTOU
> "bug", though it didn't put the kernel at risk because
> __xstate_request_perm()
> doesn't allow reducing permissions.
>
> c) allows __xfd_enable_feature() to require the caller to provide just XFD
> features
>
All the sample code looks good to me, except WARN_ON() introduced in
the last bullet.
If no objection from other maintainers, we can incorporate it in next version.
Thanks
Kevin
Powered by blists - more mailing lists