[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1489448438-29865-1-git-send-email-mst@redhat.com>
Date: Tue, 14 Mar 2017 01:44:39 +0200
From: "Michael S. Tsirkin" <mst@...hat.com>
To: linux-kernel@...r.kernel.org
Cc: "Gabriel L. Somlo" <gsomlo@...il.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Radim Krčmář <rkrcmar@...hat.com>,
Jonathan Corbet <corbet@....net>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
linux-doc@...r.kernel.org
Subject: [PATCH v3] kvm: better MWAIT emulation for guests
Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem:
unless explicitly provided with kernel command line argument
"idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability,
without checking CPUID.
We currently emulate that as a NOP but on VMX we can do better: let
guest stop the CPU until timer, IPI or memory change. CPU will be busy
but that isn't any worse than a NOP emulation.
Note that mwait within guests is not the same as on real hardware
because halt causes an exit while mwait doesn't. For this reason it
might not be a good idea to use the regular MWAIT flag in CPUID to
signal this capability. Add a flag in the hypervisor leaf instead.
Additionally, we add a capability for QEMU - e.g. if it knows there's an
isolated CPU dedicated for the VCPU it can set the standard MWAIT flag
to improve guest behaviour.
Reported-by: "Gabriel L. Somlo" <gsomlo@...il.com>
Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
---
Note: SVM bits are untested at this point. Seems pretty
obvious though.
changes from v2:
- add a capability to allow host userspace to detect new kernels
- more documentation to clarify the semantics of the feature flag
and why it's useful
- svm support as suggested by Radim
changes from v1:
- typo fix resulting in rest of leaf flags being overwritten
Reported by: Wanpeng Li <kernellwp@...il.com>
- updated commit log with data about guests helped by this feature
- better document differences between mwait and halt for guests
Documentation/virtual/kvm/api.txt | 12 ++++++------
Documentation/virtual/kvm/cpuid.txt | 6 ++++++
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kvm/cpuid.c | 3 +++
arch/x86/kvm/svm.c | 2 --
arch/x86/kvm/vmx.c | 4 ----
arch/x86/kvm/x86.c | 3 +++
include/uapi/linux/kvm.h | 1 +
8 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 0694509..c7beb07 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4135,11 +4135,11 @@ available, means that that the kernel can support guests using the
radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
processor).
-8.4 KVM_CAP_PPC_HASH_MMU_V3
+8.5 KVM_CAP_X86_GUEST_MWAIT
-Architectures: ppc
+Architectures: x86
-This capability, if KVM_CHECK_EXTENSION indicates that it is
-available, means that that the kernel can support guests using the
-hashed page table MMU defined in Power ISA V3.00 (as implemented in
-the POWER9 processor), including in-memory segment tables.
+This capability indicates that guest using memory monotoring instructions
+(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit. As such time
+spent while virtual CPU is halted in this way will then be accounted for as
+guest running time on the host (as opposed to e.g. HLT).
diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..04c201c 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit
|| || before enabling paravirtualized
|| || spinlock support.
------------------------------------------------------------------------------
+KVM_FEATURE_MWAIT || 8 || guest can use monitor/mwait
+ || || to halt the VCPU without exits,
+ || || time spent while halted in this
+ || || way is accounted for on host as
+ || || VCPU run time.
+------------------------------------------------------------------------------
KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side
|| || per-cpu warps are expected in
|| || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index cff0bb6..9cc77a7 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -24,6 +24,7 @@
#define KVM_FEATURE_STEAL_TIME 5
#define KVM_FEATURE_PV_EOI 6
#define KVM_FEATURE_PV_UNHALT 7
+#define KVM_FEATURE_MWAIT 8
/* The last 8 bits are used to indicate how to interpret the flags field
* in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index efde6cc..3c7fca83 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -594,6 +594,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
+ if (this_cpu_has(X86_FEATURE_MWAIT))
+ entry->eax |= (1 << KVM_FEATURE_MWAIT);
+
entry->ebx = 0;
entry->ecx = 0;
entry->edx = 0;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d1efe2c..18e53bc 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1198,8 +1198,6 @@ static void init_vmcb(struct vcpu_svm *svm)
set_intercept(svm, INTERCEPT_CLGI);
set_intercept(svm, INTERCEPT_SKINIT);
set_intercept(svm, INTERCEPT_WBINVD);
- set_intercept(svm, INTERCEPT_MONITOR);
- set_intercept(svm, INTERCEPT_MWAIT);
set_intercept(svm, INTERCEPT_XSETBV);
control->iopm_base_pa = iopm_base;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4bfe349..b167aba 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3547,13 +3547,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
CPU_BASED_USE_IO_BITMAPS |
CPU_BASED_MOV_DR_EXITING |
CPU_BASED_USE_TSC_OFFSETING |
- CPU_BASED_MWAIT_EXITING |
- CPU_BASED_MONITOR_EXITING |
CPU_BASED_INVLPG_EXITING |
CPU_BASED_RDPMC_EXITING;
- printk(KERN_ERR "cleared CPU_BASED_MWAIT_EXITING + CPU_BASED_MONITOR_EXITING\n");
-
opt = CPU_BASED_TPR_SHADOW |
CPU_BASED_USE_MSR_BITMAPS |
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1faf620..a507635 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2684,6 +2684,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ADJUST_CLOCK:
r = KVM_CLOCK_TSC_STABLE;
break;
+ case KVM_CAP_X86_GUEST_MWAIT:
+ r = !!this_cpu_has(X86_FEATURE_MWAIT);
+ break;
case KVM_CAP_X86_SMM:
/* SMBASE is usually relocated above 1M on modern chipsets,
* and SMM handlers might indeed rely on 4G segment limits,
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f51d508..8b6bc06 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -883,6 +883,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_PPC_MMU_RADIX 134
#define KVM_CAP_PPC_MMU_HASH_V3 135
#define KVM_CAP_IMMEDIATE_EXIT 136
+#define KVM_CAP_X86_GUEST_MWAIT 137
#ifdef KVM_CAP_IRQ_ROUTING
--
MST
Powered by blists - more mailing lists