[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87k0h85m65.ffs@tglx>
Date: Tue, 16 Nov 2021 13:18:10 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: "Liu, Jing2" <jing2.liu@...el.com>,
Paolo Bonzini <pbonzini@...hat.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"Nakajima, Jun" <jun.nakajima@...el.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Arjan van de Ven <arjan@...ux.intel.com>,
Jing Liu <jing2.liu@...ux.intel.com>,
"seanjc@...gle.com" <seanjc@...gle.com>,
"Cooper, Andrew" <andrew.cooper3@...rix.com>,
"Liu, Jing2" <jing2.liu@...el.com>,
"Bae, Chang Seok" <chang.seok.bae@...el.com>
Subject: Re: Thoughts of AMX KVM support based on latest kernel
Jing,
On Wed, Nov 10 2021 at 13:01, Jing2 Liu wrote:
> Triggering of a reallocation request and error handling
>
> First, we want to avoid weird guest failures at runtime due to (more likely)
> permission failures of a reallocation request, checking the permissions of the
> vcpu (for the extend features) at kvm_vcpu_ioctl_set_cpuid2() time, when
> QEMU wants to advertise the extended features (e.g. AMX) for the first
> time.
That's the right thing to do. If there is no permission for the guest
granted via the prctl() extension I suggested then exposing AMX should
be rejected.
> We have no idea at vcpu_create() time whether QEMU wants to enable AMX
> or not at that time. If kvm_vcpu_ioctl_set_cpuid2() succeeds, then there is
> no need to further check permission in reallocation path.
That's correct.
> Upon detection (interception) of an attempt by a vcpu to write to XCR0 (XSETBV)
> and XFD (WRMSR), we check if the write is valid, and we start passthrough of
> the XFD MSRs if the dynamic feature[i] meets the condition
> XCR0[i]=1 && XFD[i]=0. And we make a reallocation request to the FPU core.
>
> We simplify the KVM implementation by assuming that the reallocation
> request was successful when the vcpu comes back to KVM. For such VM exit
> handling that requires a buffer-reallocation request, we don't resume the
> guest immediately. Instead, we go back to the userspace, to rely on the
> userspace VMM (e.g. QEMU) for handling error cases. The actual reallocation
> happens when control is transferred from KVM to the kernel (FPU core). If
> no error, QEMU will come back to KVM by repeating vcpu_ioctl_run().
>
> Potential failures there are due to lack of memory. But this would not be
> interesting cases; the host should have more resource problems at that
> time if that is the case.
Indeed.
> One of potential drawbacks of the Option 2 might be additional
> checks in the host, although we can minimize the impact by having
> CONFIG_KVM_TBD. We believe that the case
> "XFD != 0 and XINUSE != 0" should be very infrequent.
I really don't like the idea of having an extra check in switch_to().
Can we start simple and do something like the uncompiled below and see
how much overhead it creates?
Thanks,
tglx
---
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 0f8b90ab18c9..6175a78e0be8 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -122,4 +122,12 @@ static __always_inline __pure bool fpu_state_size_dynamic(void)
}
#endif
+void fpu_update_guest_xfd_state(void);
+
+static inline void kvm_update_guest_xfd_state(void)
+{
+ if (fpu_state_size_dynamic())
+ fpu_update_guest_xfd_state();
+}
+
#endif
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 8ea306b1bf8e..161db48c9052 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -199,6 +199,17 @@ void fpu_reset_from_exception_fixup(void)
}
#if IS_ENABLED(CONFIG_KVM)
+void fpu_update_guest_xfd_state(void)
+{
+ u64 xfd;
+
+ /* FIXME: Add debug */
+ rdmsrl(MSR_IA32_XFD, xfd);
+ current->thread.fpu.fpstate->xfd = xfd;
+ __this_cpu_write(xfd_state, xfd);
+}
+EXPORT_SYMBOL_GPL(fpu_update_guest_xfd_state);
+
static void __fpstate_reset(struct fpstate *fpstate);
bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2686f2edb47c..9425fdbb4806 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9576,6 +9576,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
vcpu->arch.last_vmentry_cpu = vcpu->cpu;
vcpu->arch.last_guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
+ kvm_update_guest_xfd_state();
+
vcpu->mode = OUTSIDE_GUEST_MODE;
smp_wmb();
Powered by blists - more mailing lists