[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6bbc5184-a675-1937-eb98-639906a9cf15@redhat.com>
Date: Thu, 14 Oct 2021 11:01:11 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: "Liu, Jing2" <jing2.liu@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>
Cc: "x86@...nel.org" <x86@...nel.org>,
"Bae, Chang Seok" <chang.seok.bae@...el.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Arjan van de Ven <arjan@...ux.intel.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"Nakajima, Jun" <jun.nakajima@...el.com>,
Jing Liu <jing2.liu@...ux.intel.com>,
"seanjc@...gle.com" <seanjc@...gle.com>,
"Cooper, Andrew" <andrew.cooper3@...rix.com>
Subject: Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core
On 14/10/21 10:02, Liu, Jing2 wrote:
>> In principle I don't like it very much; it would be nicer to say "you
>> enable it for QEMU itself via arch_prctl(ARCH_SET_STATE_ENABLE), and for
>> the guests via ioctl(KVM_SET_CPUID2)". But I can see why you want to
>> keep things simple, so it's not a strong objection at all.
>
> Does this mean that KVM allocate 3 buffers via
> 1) Qemu's request, instead of via 2) guest XCR0 trap?
Based on the input from Andy and Thomas, the new way would be like this:
1) host_fpu must always be checked for reallocation in
kvm_load_guest_fpu (or in the FPU functions that it calls, that depends
on the rest of Thomas's patches). That's because arch_prctl can enable
AMX for QEMU at any point after KVM_CREATE_VCPU.
2) every use of vcpu->arch.guest_supported_xcr0 is changed to only
include those dynamic-feature bits that were enabled via arch_prctl.
That is, something like:
static u64 kvm_guest_supported_cr0(struct kvm_vcpu *vcpu)
{
return vcpu->arch.guest_supported_xcr0 &
(~xfeatures_mask_user_dynamic | \
current->thread.fpu.dynamic_state_perm);
}
3) Even with passthrough disabled, the guest can run with XFD set to
vcpu->arch.guest_xfd (and likewise for XFD_ERR) which is much simpler
than trapping #NM. The traps for writing XCR0 and XFD are used to
allocate dynamic state for guest_fpu, and start the passthrough of XFD
and XFD_ERR. What we need is:
- if a dynamic state has XCR0[n]=0, bit n will never be set in XFD_ERR
and the state will never be dirtied by the guest.
- if a dynamic state has XCR0[n]=1, but all enabled dynamic states have
XFD[n]=1, the guest is not able to dirty any dynamic XSAVE state,
because they all have either XCR0[n]=0 or XFD[n]=1. An attempt to do so
will cause an #NM trap and set the bit in XFD_ERR.
- if a dynamic state has XCR0[n]=1 and XFD[n]=0, the state for bit n is
allocated in guest_fpu, and it can also disable the vmexits for XFD and
XFD_ERR.
Therefore:
- if passthrough is disabled, the XCR0 and XFD write traps can check
guest_xcr0 & ~guest_xfd. If it includes a dynamic state bit, dynamic
state is allocated for all bits enabled in guest_xcr0 and passthrough is
started; this should happen shortly after the guest gets its first #NM
trap for AMX.
- if passthrough is enabled, the XCR0 write trap must still ensure that
dynamic state is allocated for all bits enabled in guest_xcr0.
So something like this pseudocode is called by both XCR0 and XFD writes:
int kvm_alloc_fpu_dynamic_features(struct kvm_vcpu *vcpu)
{
u64 allowed_dynamic = current->thread.fpu.dynamic_state_perm;
u64 enabled_dynamic =
vcpu->arch.xcr0 & xfeatures_mask_user_dynamic;
/* All dynamic features have to be arch_prctl'd first. */
WARN_ON_ONCE(enabled_dynamic & ~allowed_dynamic);
if (!vcpu->arch.xfd_passthrough) {
/* All dynamic states will #NM? Wait and see. */
if ((enabled_dynamic & ~vcpu->arch.xfd) == 0)
return 0;
kvm_x86_ops.enable_xfd_passthrough(vcpu);
}
/* current->thread.fpu was already handled by arch_prctl. */
return fpu_alloc_features(vcpu->guest_fpu,
vcpu->guest_fpu.dynamic_state_perm | enabled_dynamic);
}
Paolo
Powered by blists - more mailing lists