lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB5276F8C1F89911D4E04F8A9C8C779@BN9PR11MB5276.namprd11.prod.outlook.com>
Date:   Thu, 16 Dec 2021 05:36:08 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Paolo Bonzini <pbonzini@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Wang, Wei W" <wei.w.wang@...el.com>,
        "quintela@...hat.com" <quintela@...hat.com>
CC:     LKML <linux-kernel@...r.kernel.org>,
        "Dr. David Alan Gilbert" <dgilbert@...hat.com>,
        Jing Liu <jing2.liu@...ux.intel.com>,
        "Zhong, Yang" <yang.zhong@...el.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "Christopherson,, Sean" <seanjc@...gle.com>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        "Zeng, Guang" <guang.zeng@...el.com>
Subject: RE: [patch 5/6] x86/fpu: Provide fpu_update_guest_xcr0/xfd()

> From: Tian, Kevin
> Sent: Thursday, December 16, 2021 9:00 AM
> >
> > Off-list, Thomas mentioned doing it even at vCPU creation as long as the
> > prctl has been called.  That is also okay and even simpler.
> 
> Make sense. It also avoids the #GP thing in the emulation path if just due
> to reallocation error.
> 
> We'll follow this direction to do a quick update/test.
> 

After some study there are three opens which we'd like to sync here. Once
they are closed we'll send out a new version very soon (hopefully tomorrow).

1) Have a full expanded buffer at vCPU creation

There are two options.

One is to directly allocate a big-enough buffer upon guest_fpu::perm in
fpu_alloc_guest_fpstate(). There is no reallocation per-se thus most changes
in this series are not required.

The other is to keep the reallocation concept (thus all previous patches are
kept) and still call a wrapper around __xfd_enable_feature() even at vCPU
creation (e.g. right after fpu_init_guest_permissions()). This matches the
fpu core assumption that fpstate for xfd features are dynamically allocated,
though the actual calling point may not be dynamic. This also allows us
to exploit doing expansion in KVM_SET_CPUID2 (see next).

2) Do expansion at vCPU creation or KVM_ SET_CPUID2?

If the reallocation concept is still kept, then we feel doing expansion in
KVM_SET_CPUID2 makes slightly more sense. There is no functional 
difference between two options since the guest is not running at this 
point. And in general Qemu should set prctl according to the cpuid bits. 
But since anyway we still need to check guest cpuid against guest perm in 
KVM_SET_CPUID2, it reads clearer to expand the buffer only after this 
check is passed.

If this approach is agreed, then we may replace the helper functions in
this patch with a new one:

/*
 * fpu_update_guest_perm_features - Enable xfeatures according to guest perm
 * @guest_fpu:		Pointer to the guest FPU container
 *
 * Enable all dynamic xfeatures according to guest perm. Invoked if the
 * caller wants to conservatively expand fpstate buffer instead of waiting
 * until a given feature is accessed.
 *
 * Return: 0 on success, error code otherwise
 */
+int fpu_update_guest_perm_features(struct fpu_guest *guest_fpu)
+{
+	u64 expand;
+
+	lockdep_assert_preemption_enabled();
+
+	if (!IS_ENABLED(CONFIG_X86_64))
+		return 0;
+
+	expand = guest_fpu->perm & ~guest_fpu->xfeatures;
+	if (!expand)
+		return 0;
+
+	return __xfd_enable_feature(expand, guest_fpu);
+}
+EXPORT_SYMBOL_GPL(fpu_update_guest_features);

3) Always disable interception of disable after 1st interception?

Once we choose to have a full expanded buffer before guest runs, the
point of intercepting WRMSR(IA32_XFD) becomes less obvious since
no dynamic reallocation is required.

One option is to always disable WRMSR interception once 
KVM_SET_CPUID2 succeeds, with the cost of one RDMSR per vm-exit. 
But doing so affects legacy OS which even has no XFD logic at all.

The other option is to continue the current policy i.e. disable write 
emulation only after the 1st interception of setting XFD to a non-zero 
value. Then the RDMSR cost is added only for guest which supports XFD.

In either case we need another helper to update guest_fpu->fpstate->xfd
as required in the restore path. For the 2nd option we further want
to update MSR if guest_fpu is currently in use:

+void xfd_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd)
+{
+	fpregs_lock();
+	guest_fpu->fpstae->xfd = xfd;
+	if (guest_fpu->fpstate->in_use)
+		xfd_update_state(guest_fpu->fpstate);
+	fpregs_unlock();
+}

Thoughts?
--
p.s. currently we're working on a quick prototype based on:
  - Expand buffer in KVM_SET_CPUID2
  - Disable write emulation after first interception
to check any oversight. 

Thanks
Kevin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ