[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211229131328.12283-22-yang.zhong@intel.com>
Date: Wed, 29 Dec 2021 05:13:28 -0800
From: Yang Zhong <yang.zhong@...el.com>
To: x86@...nel.org, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kselftest@...r.kernel.org,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, pbonzini@...hat.com, corbet@....net,
shuah@...nel.org
Cc: seanjc@...gle.com, jun.nakajima@...el.com, kevin.tian@...el.com,
jing2.liu@...ux.intel.com, jing2.liu@...el.com,
guang.zeng@...el.com, wei.w.wang@...el.com, yang.zhong@...el.com
Subject: [PATCH v4 21/21] kvm: x86: Disable interception for IA32_XFD on demand
From: Kevin Tian <kevin.tian@...el.com>
Always intercepting IA32_XFD causes non-negligible overhead when this
register is updated frequently in the guest.
Disable r/w emulation after intercepting the first WRMSR(IA32_XFD)
with a non-zero value.
Disable WRMSR emulation implies that IA32_XFD becomes out-of-sync
with the software states in fpstate and the per-cpu xfd cache. Call
fpu_sync_guest_vmexit_xfd_state() to bring them back in-sync, before
preemption is enabled.
Always trap #NM once write interception is disabled for IA32_XFD.
The #NM exception is rare if the guest doesn't use dynamic features.
Otherwise, there is at most one exception per guest task given a
dynamic feature.
p.s. We have confirmed that SDM is being revised to say that
when setting IA32_XFD[18] the AMX register state is not guaranteed
to be preserved. This clarification avoids adding mess for a creative
guest which sets IA32_XFD[18]=1 before saving active AMX state to
its own storage.
Signed-off-by: Kevin Tian <kevin.tian@...el.com>
Signed-off-by: Jing Liu <jing2.liu@...el.com>
Signed-off-by: Yang Zhong <yang.zhong@...el.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/vmx.c | 23 ++++++++++++++++++-----
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 3 +++
4 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 555f4de47ef2..a372dfe6f407 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -640,6 +640,7 @@ struct kvm_vcpu_arch {
u64 smi_count;
bool tpr_access_reporting;
bool xsaves_enabled;
+ bool xfd_no_write_intercept;
u64 ia32_xss;
u64 microcode_version;
u64 arch_capabilities;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 638665b3e241..21c4d7409433 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -162,6 +162,7 @@ static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
MSR_FS_BASE,
MSR_GS_BASE,
MSR_KERNEL_GS_BASE,
+ MSR_IA32_XFD,
MSR_IA32_XFD_ERR,
#endif
MSR_IA32_SYSENTER_CS,
@@ -766,10 +767,11 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
}
/*
- * Trap #NM if guest xfd contains a non-zero value so guest XFD_ERR
- * can be saved timely.
+ * Disabling xfd interception indicates that dynamically-enabled
+ * features might be used in the guest. Always trap #NM in this case
+ * for proper virtualization of guest xfd_err.
*/
- if (vcpu->arch.guest_fpu.fpstate->xfd)
+ if (vcpu->arch.xfd_no_write_intercept)
eb |= (1u << NM_VECTOR);
vmcs_write32(EXCEPTION_BITMAP, eb);
@@ -1971,9 +1973,20 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
case MSR_IA32_XFD:
ret = kvm_set_msr_common(vcpu, msr_info);
- /* Update #NM interception according to guest xfd */
- if (!ret)
+ /*
+ * Always intercepting WRMSR could incur non-negligible
+ * overhead given xfd might be touched in guest context
+ * switch. Disable write interception upon the first write
+ * with a non-zero value (indicate potential guest usage
+ * on dynamic xfeatures). Also update exception bitmap
+ * to trap #NM for proper virtualization of guest xfd_err.
+ */
+ if (!ret && data) {
+ vmx_disable_intercept_for_msr(vcpu, MSR_IA32_XFD,
+ MSR_TYPE_RW);
+ vcpu->arch.xfd_no_write_intercept = true;
vmx_update_exception_bitmap(vcpu);
+ }
break;
#endif
case MSR_IA32_SYSENTER_CS:
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index bf9d3051cd6c..0a00242a91e7 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -340,7 +340,7 @@ struct vcpu_vmx {
struct lbr_desc lbr_desc;
/* Save desired MSR intercept (read: pass-through) state */
-#define MAX_POSSIBLE_PASSTHROUGH_MSRS 14
+#define MAX_POSSIBLE_PASSTHROUGH_MSRS 15
struct {
DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76e1941db223..4990d6de5359 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10027,6 +10027,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (vcpu->arch.guest_fpu.xfd_err)
wrmsrl(MSR_IA32_XFD_ERR, 0);
+ if (vcpu->arch.xfd_no_write_intercept)
+ fpu_sync_guest_vmexit_xfd_state();
+
/*
* Consume any pending interrupts, including the possible source of
* VM-Exit on SVM and any ticks that occur between VM-Exit and now.
Powered by blists - more mailing lists