[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL715WJXWQgfzgh8KqL+pAzeqL+dkF6imfRM37nQ6PkZd09mhQ@mail.gmail.com>
Date: Sun, 14 Apr 2024 23:06:17 -0700
From: Mingwei Zhang <mizhang@...gle.com>
To: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
Cc: Sean Christopherson <seanjc@...gle.com>, Xiong Zhang <xiong.y.zhang@...ux.intel.com>,
pbonzini@...hat.com, peterz@...radead.org, kan.liang@...el.com,
zhenyuw@...ux.intel.com, jmattson@...gle.com, kvm@...r.kernel.org,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
zhiyuan.lv@...el.com, eranian@...gle.com, irogers@...gle.com,
samantha.alt@...el.com, like.xu.linux@...il.com, chao.gao@...el.com
Subject: Re: [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU
state for Intel CPU
On Fri, Apr 12, 2024 at 9:25 PM Mi, Dapeng <dapeng1.mi@...ux.intel.com> wrote:
>
>
> On 4/13/2024 11:34 AM, Mingwei Zhang wrote:
> > On Sat, Apr 13, 2024, Mi, Dapeng wrote:
> >> On 4/12/2024 5:44 AM, Sean Christopherson wrote:
> >>> On Fri, Jan 26, 2024, Xiong Zhang wrote:
> >>>> From: Dapeng Mi <dapeng1.mi@...ux.intel.com>
> >>>>
> >>>> Implement the save/restore of PMU state for pasthrough PMU in Intel. In
> >>>> passthrough mode, KVM owns exclusively the PMU HW when control flow goes to
> >>>> the scope of passthrough PMU. Thus, KVM needs to save the host PMU state
> >>>> and gains the full HW PMU ownership. On the contrary, host regains the
> >>>> ownership of PMU HW from KVM when control flow leaves the scope of
> >>>> passthrough PMU.
> >>>>
> >>>> Implement PMU context switches for Intel CPUs and opptunistically use
> >>>> rdpmcl() instead of rdmsrl() when reading counters since the former has
> >>>> lower latency in Intel CPUs.
> >>>>
> >>>> Co-developed-by: Mingwei Zhang <mizhang@...gle.com>
> >>>> Signed-off-by: Mingwei Zhang <mizhang@...gle.com>
> >>>> Signed-off-by: Dapeng Mi <dapeng1.mi@...ux.intel.com>
> >>>> ---
> >>>> arch/x86/kvm/vmx/pmu_intel.c | 73 ++++++++++++++++++++++++++++++++++++
> >>>> 1 file changed, 73 insertions(+)
> >>>>
> >>>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> >>>> index 0d58fe7d243e..f79bebe7093d 100644
> >>>> --- a/arch/x86/kvm/vmx/pmu_intel.c
> >>>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> >>>> @@ -823,10 +823,83 @@ void intel_passthrough_pmu_msrs(struct kvm_vcpu *vcpu)
> >>>> static void intel_save_pmu_context(struct kvm_vcpu *vcpu)
> >>> I would prefer there be a "guest" in there somewhere, e.g. intel_save_guest_pmu_context().
> >> Yeah. It looks clearer.
> >>>> {
> >>>> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> >>>> + struct kvm_pmc *pmc;
> >>>> + u32 i;
> >>>> +
> >>>> + if (pmu->version != 2) {
> >>>> + pr_warn("only PerfMon v2 is supported for passthrough PMU");
> >>>> + return;
> >>>> + }
> >>>> +
> >>>> + /* Global ctrl register is already saved at VM-exit. */
> >>>> + rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, pmu->global_status);
> >>>> + /* Clear hardware MSR_CORE_PERF_GLOBAL_STATUS MSR, if non-zero. */
> >>>> + if (pmu->global_status)
> >>>> + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, pmu->global_status);
> >>>> +
> >>>> + for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
> >>>> + pmc = &pmu->gp_counters[i];
> >>>> + rdpmcl(i, pmc->counter);
> >>>> + rdmsrl(i + MSR_ARCH_PERFMON_EVENTSEL0, pmc->eventsel);
> >>>> + /*
> >>>> + * Clear hardware PERFMON_EVENTSELx and its counter to avoid
> >>>> + * leakage and also avoid this guest GP counter get accidentally
> >>>> + * enabled during host running when host enable global ctrl.
> >>>> + */
> >>>> + if (pmc->eventsel)
> >>>> + wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, 0);
> >>>> + if (pmc->counter)
> >>>> + wrmsrl(MSR_IA32_PMC0 + i, 0);
> >>> This doesn't make much sense. The kernel already has full access to the guest,
> >>> I don't see what is gained by zeroing out the MSRs just to hide them from perf.
> >> It's necessary to clear the EVENTSELx MSRs for both GP and fixed counters.
> >> Considering this case, Guest uses GP counter 2, but Host doesn't use it. So
> >> if the EVENTSEL2 MSR is not cleared here, the GP counter 2 would be enabled
> >> unexpectedly on host later since Host perf always enable all validate bits
> >> in PERF_GLOBAL_CTRL MSR. That would cause issues.
> >>
> >> Yeah, the clearing for PMCx MSR should be unnecessary .
> >>
> > Why is clearing for PMCx MSR unnecessary? Do we want to leaking counter
> > values to the host? NO. Not in cloud usage.
>
> No, this place is clearing the guest counter value instead of host
> counter value. Host always has method to see guest value in a normal VM
> if he want. I don't see its necessity, it's just a overkill and
> introduce extra overhead to write MSRs.
>
I am curious how the perf subsystem solves the problem? Does perf
subsystem in the host only scrubbing the selector but not the counter
value when doing the context switch?
>
> >
> > Please make changes to this patch with **extreme** caution.
> >
> > According to our past experience, if there is a bug somewhere,
> > there is a catch here (normally).
> >
> > Thanks.
> > -Mingwei
> >>> Similarly, if perf enables a counter if PERF_GLOBAL_CTRL without first restoring
> >>> the event selector, we gots problems.
> >>>
> >>> Same thing for the fixed counters below. Can't this just be?
> >>>
> >>> for (i = 0; i < pmu->nr_arch_gp_counters; i++)
> >>> rdpmcl(i, pmu->gp_counters[i].counter);
> >>>
> >>> for (i = 0; i < pmu->nr_arch_fixed_counters; i++)
> >>> rdpmcl(INTEL_PMC_FIXED_RDPMC_BASE | i,
> >>> pmu->fixed_counters[i].counter);
> >>>
> >>>> + }
> >>>> +
> >>>> + rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, pmu->fixed_ctr_ctrl);
> >>>> + /*
> >>>> + * Clear hardware FIXED_CTR_CTRL MSR to avoid information leakage and
> >>>> + * also avoid these guest fixed counters get accidentially enabled
> >>>> + * during host running when host enable global ctrl.
> >>>> + */
> >>>> + if (pmu->fixed_ctr_ctrl)
> >>>> + wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> >>>> + for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
> >>>> + pmc = &pmu->fixed_counters[i];
> >>>> + rdpmcl(INTEL_PMC_FIXED_RDPMC_BASE | i, pmc->counter);
> >>>> + if (pmc->counter)
> >>>> + wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 0);
> >>>> + }
> >>>> }
> >>>> static void intel_restore_pmu_context(struct kvm_vcpu *vcpu)
> >>>> {
> >>>> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> >>>> + struct kvm_pmc *pmc;
> >>>> + u64 global_status;
> >>>> + int i;
> >>>> +
> >>>> + if (pmu->version != 2) {
> >>>> + pr_warn("only PerfMon v2 is supported for passthrough PMU");
> >>>> + return;
> >>>> + }
> >>>> +
> >>>> + /* Clear host global_ctrl and global_status MSR if non-zero. */
> >>>> + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> >>> Why? PERF_GLOBAL_CTRL will be auto-loaded at VM-Enter, why do it now?
> >> As previous comments say, host perf always enable all counters in
> >> PERF_GLOBAL_CTRL by default. The reason to clear PERF_GLOBAL_CTRL here is to
> >> ensure all counters in disabled state and the later counter manipulation
> >> (writing MSR) won't cause any race condition or unexpected behavior on HW.
> >>
> >>
> >>>> + rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, global_status);
> >>>> + if (global_status)
> >>>> + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, global_status);
> >>> This seems especially silly, isn't the full MSR being written below? Or am I
> >>> misunderstanding how these things work?
> >> I think Jim's comment has already explain why we need to do this.
> >>
> >>>> + wrmsrl(MSR_CORE_PERF_GLOBAL_STATUS_SET, pmu->global_status);
> >>>> +
> >>>> + for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
> >>>> + pmc = &pmu->gp_counters[i];
> >>>> + wrmsrl(MSR_IA32_PMC0 + i, pmc->counter);
> >>>> + wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, pmc->eventsel);
> >>>> + }
> >>>> +
> >>>> + wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, pmu->fixed_ctr_ctrl);
> >>>> + for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
> >>>> + pmc = &pmu->fixed_counters[i];
> >>>> + wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, pmc->counter);
> >>>> + }
> >>>> }
> >>>> struct kvm_pmu_ops intel_pmu_ops __initdata = {
> >>>> --
> >>>> 2.34.1
> >>>>
Powered by blists - more mailing lists