[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFs1OL8QybDRUQkF@google.com>
Date: Tue, 24 Jun 2025 16:31:04 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Jim Mattson <jmattson@...gle.com>
Cc: linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH v4 2/3] KVM: x86: Provide a capability to disable
APERF/MPERF read intercepts
On Fri, May 30, 2025, Jim Mattson wrote:
> @@ -7790,6 +7791,28 @@ all such vmexits.
>
> Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
>
> +Virtualizing the ``IA32_APERF`` and ``IA32_MPERF`` MSRs requires more
> +than just disabling APERF/MPERF exits. While both Intel and AMD
> +document strict usage conditions for these MSRs--emphasizing that only
> +the ratio of their deltas over a time interval (T0 to T1) is
> +architecturally defined--simply passing through the MSRs can still
> +produce an incorrect ratio.
> +
> +This erroneous ratio can occur if, between T0 and T1:
> +
> +1. The vCPU thread migrates between logical processors.
> +2. Live migration or suspend/resume operations take place.
> +3. Another task shares the vCPU's logical processor.
> +4. C-states lower thean C0 are emulated (e.g., via HLT interception).
> +5. The guest TSC frequency doesn't match the host TSC frequency.
> +
> +Due to these complexities, KVM does not automatically associate this
> +passthrough capability with the guest CPUID bit,
> +``CPUID.6:ECX.APERFMPERF[bit 0]``. Userspace VMMs that deem this
> +mechanism adequate for virtualizing the ``IA32_APERF`` and
> +``IA32_MPERF`` MSRs must set the guest CPUID bit explicitly.
Question: what do we want to do about nested? Due to differences between SVM
and VMX at the time you posted your patches, this series _as posted_ will do
nested passthrough for SVM, but not VMX (before the MSR rework, SVM auto-merged
bitmaps for all MSRs in svm_direct_access_msrs).
As I've got it locally applied, neither SVM nor VMX will do passthrough to L2.
I'm leaning toward allowing full passthrough, because (a) it's easy, (b) I can't
think of any reason not to, and (c) SVM's semi-auto-merging logic means we could
*unintentinally* do full passthrough in the future, in the unlikely event that
KVM added passthrough support for an MSR in the same chunk as APERF and MPERF.
This would be the extent of the changes (I think, haven't tested yet).
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 749f7b866ac8..b7fd2e869998 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -194,7 +194,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
* Hardcode the capacity of the array based on the maximum number of _offsets_.
* MSRs are batched together, so there are fewer offsets than MSRs.
*/
-static int nested_svm_msrpm_merge_offsets[6] __ro_after_init;
+static int nested_svm_msrpm_merge_offsets[7] __ro_after_init;
static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
typedef unsigned long nsvm_msrpm_merge_t;
@@ -216,6 +216,8 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
MSR_IA32_SPEC_CTRL,
MSR_IA32_PRED_CMD,
MSR_IA32_FLUSH_CMD,
+ MSR_IA32_APERF,
+ MSR_IA32_MPERF,
MSR_IA32_LASTBRANCHFROMIP,
MSR_IA32_LASTBRANCHTOIP,
MSR_IA32_LASTINTFROMIP,
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index c69df3aba8d1..b8ea1969113d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -715,6 +715,12 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_APERF, MSR_TYPE_R);
+
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_MPERF, MSR_TYPE_R);
+
kvm_vcpu_unmap(vcpu, &map);
vmx->nested.force_msr_bitmap_recalc = false;
Powered by blists - more mailing lists