[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b34bff98-9f2d-539f-7ee9-7bba09a8269a@intel.com>
Date: Mon, 30 Jan 2023 19:46:37 +0800
From: "Yang, Weijiang" <weijiang.yang@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: <pbonzini@...hat.com>, <jmattson@...gle.com>,
<kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<like.xu.linux@...il.com>, <kan.liang@...ux.intel.com>,
<wei.w.wang@...el.com>
Subject: Re: [PATCH v2 05/15] KVM: vmx/pmu: Emulate MSR_ARCH_LBR_DEPTH for
guest Arch LBR
On 1/28/2023 4:25 AM, Sean Christopherson wrote:
> On Thu, Nov 24, 2022, Yang Weijiang wrote:
>> [...]
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -571,6 +571,9 @@ struct kvm_pmu {
>> * redundant check before cleanup if guest don't use vPMU at all.
>> */
>> u8 event_count;
>> +
>> + /* Guest arch lbr depth supported by KVM. */
>> + u64 kvm_arch_lbr_depth;
> There is zero reason to store this separately. KVM already records the allowed
> depth in kvm_vcpu.lbr_desc.records.nr.
kvm_vcpu.lbr_desc.records.nr alone cannot tell whether it's legacy lbr or arch-lbr unless
binding host arch-lbr checking.
>
>> };
>>
>> struct kvm_pmu_ops;
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index 905673228932..0c78cb4b72be 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -178,6 +178,10 @@ static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index)
>> (index == MSR_LBR_SELECT || index == MSR_LBR_TOS))
>> return true;
>>
>> + if (index == MSR_ARCH_LBR_DEPTH)
>> + return kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR) &&
> Like the previous patch, since intel_pmu_lbr_is_enabled() effectively serves as
> a generic kvm_cpu_cap_has(LBRS) check, this can be distilled to:
>
> if (cpu_feature_enabled(X86_FEATURE_ARCH_LBR)) {
> if (index == MSR_ARCH_LBR_DEPTH || index == MSR_ARCH_LBR_CTL)
> return true;
> } else {
> if (index == MSR_LBR_SELECT || index == MSR_LBR_TOS))
> return true;
> }
yes, exactly, thanks!
>> + guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR);
>> +
>> if ((index >= records->from && index < records->from + records->nr) ||
>> (index >= records->to && index < records->to + records->nr))
>> return true;
>> @@ -345,6 +349,7 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> {
>> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> struct kvm_pmc *pmc;
>> + struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
>> u32 msr = msr_info->index;
>>
>> switch (msr) {
>> @@ -369,6 +374,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> case MSR_PEBS_DATA_CFG:
>> msr_info->data = pmu->pebs_data_cfg;
>> return 0;
>> + case MSR_ARCH_LBR_DEPTH:
>> + msr_info->data = lbr_desc->records.nr;
>> + return 0;
>> default:
>> if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
>> (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
>> @@ -395,6 +403,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> {
>> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> struct kvm_pmc *pmc;
>> + struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
>> u32 msr = msr_info->index;
>> u64 data = msr_info->data;
>> u64 reserved_bits, diff;
>> @@ -456,6 +465,24 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> return 0;
>> }
>> break;
>> + case MSR_ARCH_LBR_DEPTH:
>> + if (!pmu->kvm_arch_lbr_depth && !msr_info->host_initiated)
> Don't invent a new check, just prevent KVM from reaching this path via the
> existing intel_pmu_lbr_is_enabled().
intel_pmu_lbr_is_enabled() only indicates LBR is on(either legacy or
arch-lbr), but
MSR_ARCH_LBR_DEPTH is only for arch-lbr.
>
>> + return 1;
>> + /*
>> + * When guest/host depth are different, the handling would be tricky,
>> + * so only max depth is supported for both host and guest.
>> + */
> This semi-arbitrary restriction is fine because Intel's architecture allows KVM
> to enumerate support for a single depth, but somewhere in the changelog and/or
> code that actually needs to be state. This blurb
>
> In the first generation of Arch LBR, max entry size is 32,
> host configures the max size and guest always honors the setting.
>
> makes it sound like KVM is relying on the guest to do the right thing, and this
> code looks like KVM is making up it's own behavior.
Will modify the change log.
>
>> + if (data != pmu->kvm_arch_lbr_depth)
>> + return 1;
>> +
>> + lbr_desc->records.nr = data;
>> + /*
>> + * Writing depth MSR from guest could either setting the
>> + * MSR or resetting the LBR records with the side-effect.
>> + */
>> + if (kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR))
> Another check, really? KVM shouldn't reach this point if KVM doesn't support
> Arch LBRs. And if that isn't guarantee (honestly forgot what this series actually
> proposed at this point), then that's a bug, full stop.
Right, this check is unnecessary.
>
>> + wrmsrl(MSR_ARCH_LBR_DEPTH, lbr_desc->records.nr);
> IIUC, this is subtly broken. Piecing together all of the undocumented bits, my
> understanding is that arch LBRs piggyback KVM's existing LBR support, i.e. use a
> "virtual" perf event.
Yes.
> And like traditional LBR support, the host can steal control
> of the LBRs in IRQ context by disabling the perf event via IPI. And since writes
> to MSR_ARCH_LBR_DEPTH purge LBR records, this needs to be treated as if it were a
> write to an LBR record, i.e. belongs in the IRQs disabled section of
> intel_pmu_handle_lbr_msrs_access().
I assume you're referring to host events preempt guest events. In that
case, it's possible
guest operations interfere host events/data. But this series
implementation focus on
"guest only" mode, i.e., it sets {Load|Clear}_LBR_CTL at VM entry/exit,
that way, we don't
need to care about host preempt, the event data is saved/restored at
event sched_{out|in}.
>
> If for some magical reason it's safe to access arch LBR MSRs without disabling IRQs
> and confirming perf event ownership, I want to see a very detailed changelog
> explaining exactly how that magic works.
Will change the commit log to explain more.
>
>> + return 0;
>> default:
>> if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
>> (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
>> @@ -506,6 +533,32 @@ static void setup_fixed_pmc_eventsel(struct kvm_pmu *pmu)
>> }
>> }
>>
>> +static bool cpuid_enable_lbr(struct kvm_vcpu *vcpu)
>> +{
>> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> + struct kvm_cpuid_entry2 *entry;
>> + int depth_bit;
>> +
>> + if (!kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR))
>> + return !static_cpu_has(X86_FEATURE_ARCH_LBR) &&
>> + cpuid_model_is_consistent(vcpu);
>> +
>> + pmu->kvm_arch_lbr_depth = 0;
>> + if (!guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
>> + return false;
>> +
>> + entry = kvm_find_cpuid_entry(vcpu, 0x1C);
>> + if (!entry)
>> + return false;
>> +
>> + depth_bit = fls(cpuid_eax(0x1C) & 0xff);
> This is unnecessarily fragile. Get the LBR depth from perf, don't read CPUID and
> assume perf will always configured the max depth.,
Make sense, will refactor the function in next version.
>
> This enabling also belongs at the tail end of the series, i.e. KVM shouldn't let
> userspace enable LBRs until all the support pieces are in place.
OK.
>
>> + if ((entry->eax & 0xff) != (1 << (depth_bit - 1)))
>> + return false;
>> +
>> + pmu->kvm_arch_lbr_depth = depth_bit * 8;
>> + return true;
>> +}
>> +
[...]
Powered by blists - more mailing lists