[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211208000359.2853257-1-yang.zhong@intel.com>
Date: Tue, 7 Dec 2021 19:03:40 -0500
From: Yang Zhong <yang.zhong@...el.com>
To: x86@...nel.org, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, pbonzini@...hat.com
Cc: seanjc@...gle.com, jun.nakajima@...el.com, kevin.tian@...el.com,
jing2.liu@...ux.intel.com, jing2.liu@...el.com,
yang.zhong@...el.com
Subject: [PATCH 00/19] AMX Support in KVM
(send on behalf of Jing who is currently on leave)
This series brings AMX (Advanced Matrix eXtensions) virtualization
support to KVM. The three preparation patches in fpu core from
Thomas [1] are also included.
A large portion of the changes in this series is to deal with eXtended
Feature Disable (XFD) which allows resizing of the fpstate buffer to
support dynamically-enabled XSTATE features with large state component
(e.g. 8K for AMX).
The support is based on several key changes (design discussions can be
found in [2]):
- Guest permissions for dynamically-enabled XSAVE features
Native tasks have to request permission via prctl() before touching
a dynamic-resized XSTATE compoenent. Introduce guest permissions
for the similar purpose. Userspace VMM is expected to request guest
permission only once when the first vCPU is created.
KVM checks guest permission in KVM_SET_CPUID2. Setting XFD in guest
cpuid w/o proper permissions fails this operation.
- Extend fpstate reallocation mechanism to cover guest fpu
Unlike native tasks which have reallocation triggered from #NM
handler, guest fpstate reallocation is requested by KVM when it
detects the guest intention to use dynamically-enabled XSAVE
features.
The reallocation request is handled when exiting to userspace
VMM. This implies that KVM must break vcpu_run() loop and exit
to userspace VMM instead of immediately resuming back to the guest
when reallocation is required.
- Detect fpstate reallocation in the emulation code
Because guest #NM is not trapped in KVM (costly), the guest
intention of using a dynamically-enabled XSAVE feature[i] can be
indirectly represented by guest XCR0[i]=1 and XFD[i]=0. This
requires the emulation logic of both WRMSR(IA32_XFD) and XSETBV
to check reallocation requirement when one of the two conditions
is changed.
- Disable WRMSR interception for IA32_XFD
IA32_XFD can be frequently updated by the guest, as it is part of
the task state and swapped in context switch when prev and next have
different XFD setting. Always intercepting WRMSR can easily cause
non-negligible overhead.
Disable WRMSR interception for IA32_XFD after fpstate reallocation
succeeds. After that point the guest direct writes IA32_XFD without
causing VM-exits.
However MSR passthrough implies that guest_fpstate::xfd and per-cpu
xfd cache might be out of sync with the current IA32_XFD value set by
the guest. This suggests KVM needs to re-sync the software state
with IA32_XFD before the vCPU thread might be preempted or interrupted.
- Save/restore guest XFD_ERR
When XFD causes an instruction to generate #NM, XFD_ERR contains
information about which disabled state components are being accessed.
The #NM handler is expected to check this information and then enable
the state components by clearing IA32_XFD for the faulting task (if
having permission).
#NM can be triggered in both host and guest. It'd be problematic if
the XFD_ERR value generated in guest is consumed/clobbered by the
host before the guest itself doing so. This may lead to non-XFD-
related #NM treated as XFD #NM in host (due to guest XFD_ERR value),
or XFD-related #NM treated as non-XFD #NM in guest (XFD_ERR cleared
by the host #NM handler).
KVM needs to save the guest XFD_ERR value before this register
might be accessed by the host and restore it before entering the
guest.
One open remains in this area about when to start saving/restoring
guest XFD_ERR. Several options are discussed in patch 15.
- Expose related cpuid bits to guest
The last step is to allow exposing XFD, AMX_TILE, AMX_INT8 and
AMX_BF16 in guest cpuid. Adding those bits into kvm_cpu_caps finally
activates all previous logics in this series
To verify AMX virtualization overhead on non-AMX usages, we run the
Phoronix kernel build test in the guest w/ and w/o AMX in cpuid. The
result shows no observable difference between two configurations.
Live migration support is still being worked on. Userspace VMM needs
to use the new KVM_{G|S}SET_XSAVE2 ioctl in this series to migrate state
for dynamically-enabled XSAVE features.
Thanks Thomas for the thoughts and patches on the KVM FPU and AMX
support. Thanks Jun Nakajima for the design suggestions.
[1] git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu-kvm
[2] https://www.spinics.net/lists/kvm/msg259015.html
Thanks,
Yang
---
Jing Liu (13):
kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule
kvm: x86: Check guest xstate permissions when KVM_SET_CPUID2
x86/fpu: Move xfd initialization out of __fpstate_reset() to the
callers
kvm: x86: Propagate fpstate reallocation error to userspace
x86/fpu: Move xfd_update_state() to xstate.c and export symbol
kvm: x86: Prepare reallocation check
kvm: x86: Emulate WRMSR of guest IA32_XFD
kvm: x86: Disable WRMSR interception for IA32_XFD on demand
x86/fpu: Prepare for KVM XFD_ERR handling
kvm: x86: Introduce KVM_{G|S}ET_XSAVE2 ioctl
docs: virt: api.rst: Document the new KVM_{G, S}ET_XSAVE2 ioctls
kvm: x86: AMX XCR0 support for guest
kvm: x86: Add AMX CPUIDs support
Thomas Gleixner (4):
x86/fpu: Extend prctl() with guest permissions
x86/fpu: Prepare KVM for dynamically enabled states
x86/fpu: Add reallocation mechanims for KVM
x86/fpu: Prepare KVM for bringing XFD state back in-sync
Yang Zhong (2):
kvm: x86: Check fpstate reallocation in XSETBV emulation
kvm: x86: Save and restore guest XFD_ERR properly
Documentation/virt/kvm/api.rst | 47 +++++++
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/fpu/api.h | 12 ++
arch/x86/include/asm/fpu/types.h | 56 +++++++++
arch/x86/include/asm/fpu/xstate.h | 2 +
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/include/uapi/asm/kvm.h | 6 +
arch/x86/include/uapi/asm/prctl.h | 26 ++--
arch/x86/kernel/fpu/core.c | 109 ++++++++++++++++-
arch/x86/kernel/fpu/xstate.c | 119 +++++++++++++++---
arch/x86/kernel/fpu/xstate.h | 29 +++--
arch/x86/kernel/process.c | 2 +
arch/x86/kvm/cpuid.c | 36 +++++-
arch/x86/kvm/vmx/vmx.c | 20 +++
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 189 ++++++++++++++++++++++++++++-
arch/x86/kvm/x86.h | 2 +
include/uapi/linux/kvm.h | 8 +-
19 files changed, 607 insertions(+), 63 deletions(-)
Powered by blists - more mailing lists