linux-kernel - Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6bbc5184-a675-1937-eb98-639906a9cf15@redhat.com>
Date:   Thu, 14 Oct 2021 11:01:11 +0200
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     "Liu, Jing2" <jing2.liu@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>
Cc:     "x86@...nel.org" <x86@...nel.org>,
        "Bae, Chang Seok" <chang.seok.bae@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Arjan van de Ven <arjan@...ux.intel.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        Jing Liu <jing2.liu@...ux.intel.com>,
        "seanjc@...gle.com" <seanjc@...gle.com>,
        "Cooper, Andrew" <andrew.cooper3@...rix.com>
Subject: Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core

On 14/10/21 10:02, Liu, Jing2 wrote:
>> In principle I don't like it very much; it would be nicer to say "you
>> enable it for QEMU itself via arch_prctl(ARCH_SET_STATE_ENABLE), and for
>> the guests via ioctl(KVM_SET_CPUID2)".  But I can see why you want to
>> keep things simple, so it's not a strong objection at all.
> 
> Does this mean that KVM allocate 3 buffers via
> 1) Qemu's request, instead of via 2) guest XCR0 trap?

Based on the input from Andy and Thomas, the new way would be like this:

1) host_fpu must always be checked for reallocation in 
kvm_load_guest_fpu (or in the FPU functions that it calls, that depends 
on the rest of Thomas's patches).  That's because arch_prctl can enable 
AMX for QEMU at any point after KVM_CREATE_VCPU.

2) every use of vcpu->arch.guest_supported_xcr0 is changed to only 
include those dynamic-feature bits that were enabled via arch_prctl.
That is, something like:

static u64 kvm_guest_supported_cr0(struct kvm_vcpu *vcpu)
{
	return vcpu->arch.guest_supported_xcr0 &
		(~xfeatures_mask_user_dynamic | \
		 current->thread.fpu.dynamic_state_perm);
}

3) Even with passthrough disabled, the guest can run with XFD set to 
vcpu->arch.guest_xfd (and likewise for XFD_ERR) which is much simpler 
than trapping #NM.  The traps for writing XCR0 and XFD are used to 
allocate dynamic state for guest_fpu, and start the passthrough of XFD 
and XFD_ERR.  What we need is:

- if a dynamic state has XCR0[n]=0, bit n will never be set in XFD_ERR 
and the state will never be dirtied by the guest.

- if a dynamic state has XCR0[n]=1, but all enabled dynamic states have 
XFD[n]=1, the guest is not able to dirty any dynamic XSAVE state, 
because they all have either XCR0[n]=0 or XFD[n]=1.  An attempt to do so 
will cause an #NM trap and set the bit in XFD_ERR.

- if a dynamic state has XCR0[n]=1 and XFD[n]=0, the state for bit n is 
allocated in guest_fpu, and it can also disable the vmexits for XFD and 
XFD_ERR.

Therefore:

- if passthrough is disabled, the XCR0 and XFD write traps can check 
guest_xcr0 & ~guest_xfd.  If it includes a dynamic state bit, dynamic 
state is allocated for all bits enabled in guest_xcr0 and passthrough is 
started; this should happen shortly after the guest gets its first #NM 
trap for AMX.

- if passthrough is enabled, the XCR0 write trap must still ensure that 
dynamic state is allocated for all bits enabled in guest_xcr0.

So something like this pseudocode is called by both XCR0 and XFD writes:

int kvm_alloc_fpu_dynamic_features(struct kvm_vcpu *vcpu)
{
	u64 allowed_dynamic = current->thread.fpu.dynamic_state_perm;
	u64 enabled_dynamic =
		vcpu->arch.xcr0 & xfeatures_mask_user_dynamic;

	/* All dynamic features have to be arch_prctl'd first.  */
	WARN_ON_ONCE(enabled_dynamic & ~allowed_dynamic);

	if (!vcpu->arch.xfd_passthrough) {
		/* All dynamic states will #NM?  Wait and see.  */
		if ((enabled_dynamic & ~vcpu->arch.xfd) == 0)
			return 0;

		kvm_x86_ops.enable_xfd_passthrough(vcpu);
	}

	/* current->thread.fpu was already handled by arch_prctl.  */
	return fpu_alloc_features(vcpu->guest_fpu,
		vcpu->guest_fpu.dynamic_state_perm | enabled_dynamic);
}

Paolo