[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230525142031.GU83892@hirez.programming.kicks-ass.net>
Date: Thu, 25 May 2023 16:20:31 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Oliver Upton <oliver.upton@...ux.dev>
Cc: Ravi Bangoria <ravi.bangoria@....com>,
Nathan Chancellor <nathan@...nel.org>, namhyung@...nel.org,
eranian@...gle.com, acme@...nel.org, mark.rutland@....com,
jolsa@...nel.org, irogers@...gle.com, bp@...en8.de,
kan.liang@...ux.intel.com, adrian.hunter@...el.com,
maddy@...ux.ibm.com, x86@...nel.org,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
sandipan.das@....com, ananth.narayan@....com,
santosh.shukla@....com, maz@...nel.org, kvmarm@...ts.linux.dev
Subject: Re: [PATCH v4 3/4] perf/core: Remove pmu linear searching code
On Thu, May 25, 2023 at 07:11:41AM +0000, Oliver Upton wrote:
> The PMUv3 driver does pass a name, but it relies on getting back an
> allocated pmu id as @type is -1 in the call to perf_pmu_register().
>
> What actually broke is how KVM probes for a default core PMU to use for
> a guest. kvm_pmu_probe_armpmu() creates a counter w/ PERF_TYPE_RAW and
> reads the pmu from the returned perf_event. The linear search had the
> effect of eventually stumbling on the correct core PMU and succeeding.
>
> Perf folks: is this WAI for heterogenous systems?
TBH, I'm not sure. hetero and virt don't mix very well AFAIK and I'm not
sure what ARM64 does here.
IIRC the only way is to hard affine things; that is, force vCPU of
'type' to the pCPU mask of 'type' CPUs.
If you don't do that; or let userspace 'override' that, things go
sideways *real* fast.
Mark gonna have to look at this.
> Either way, the whole KVM end of this scheme is a bit clunky, and I
> believe it to be unneccessary at this point as we maintain a list of
> core PMU instances that KVM is able to virtualize. We can just walk
> that to find a default PMU to use.
>
> Not seeing any issues on -next with the below diff. If this works for
> folks I can actually wrap it up in a patch and send it out.
>
> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> index 45727d50d18d..cbc0b662b7f8 100644
> --- a/arch/arm64/kvm/pmu-emul.c
> +++ b/arch/arm64/kvm/pmu-emul.c
> @@ -694,47 +694,26 @@ void kvm_host_pmu_init(struct arm_pmu *pmu)
>
> static struct arm_pmu *kvm_pmu_probe_armpmu(void)
> {
> - struct perf_event_attr attr = { };
> - struct perf_event *event;
> - struct arm_pmu *pmu = NULL;
> -
> - /*
> - * Create a dummy event that only counts user cycles. As we'll never
> - * leave this function with the event being live, it will never
> - * count anything. But it allows us to probe some of the PMU
> - * details. Yes, this is terrible.
> - */
> - attr.type = PERF_TYPE_RAW;
> - attr.size = sizeof(attr);
> - attr.pinned = 1;
> - attr.disabled = 0;
> - attr.exclude_user = 0;
> - attr.exclude_kernel = 1;
> - attr.exclude_hv = 1;
> - attr.exclude_host = 1;
> - attr.config = ARMV8_PMUV3_PERFCTR_CPU_CYCLES;
> - attr.sample_period = GENMASK(63, 0);
> + struct arm_pmu *arm_pmu = NULL, *tmp;
> + struct arm_pmu_entry *entry;
> + int cpu;
>
> - event = perf_event_create_kernel_counter(&attr, -1, current,
> - kvm_pmu_perf_overflow, &attr);
> + mutex_lock(&arm_pmus_lock);
> + cpu = get_cpu();
>
> - if (IS_ERR(event)) {
> - pr_err_once("kvm: pmu event creation failed %ld\n",
> - PTR_ERR(event));
> - return NULL;
> - }
> + list_for_each_entry(entry, &arm_pmus, entry) {
> + tmp = entry->arm_pmu;
>
> - if (event->pmu) {
> - pmu = to_arm_pmu(event->pmu);
> - if (pmu->pmuver == ID_AA64DFR0_EL1_PMUVer_NI ||
> - pmu->pmuver == ID_AA64DFR0_EL1_PMUVer_IMP_DEF)
> - pmu = NULL;
> + if (cpumask_test_cpu(cpu, &tmp->supported_cpus)) {
> + arm_pmu = tmp;
> + break;
> + }
> }
>
> - perf_event_disable(event);
> - perf_event_release_kernel(event);
> + put_cpu();
> + mutex_unlock(&arm_pmus_lock);
>
> - return pmu;
> + return arm_pmu;
> }
Powered by blists - more mailing lists