[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aJOO99xUrhsrvLwl@linux.dev>
Date: Wed, 6 Aug 2025 10:20:55 -0700
From: Oliver Upton <oliver.upton@...ux.dev>
To: Akihiko Odaki <odaki@....ci.i.u-tokyo.ac.jp>
Cc: Marc Zyngier <maz@...nel.org>, Joey Gouly <joey.gouly@....com>,
Suzuki K Poulose <suzuki.poulose@....com>,
Zenghui Yu <yuzenghui@...wei.com>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Kees Cook <kees@...nel.org>,
"Gustavo A. R. Silva" <gustavoars@...nel.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Jonathan Corbet <corbet@....net>, Shuah Khan <shuah@...nel.org>,
linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev,
linux-kernel@...r.kernel.org, linux-hardening@...r.kernel.org,
devel@...nix.com, kvm@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kselftest@...r.kernel.org
Subject: Re: [PATCH RFC v2 1/2] KVM: arm64: PMU: Introduce
KVM_ARM_VCPU_PMU_V3_COMPOSITION
Hi Akihiko,
This is an unreasonably large patch that needs to be broken down into
smaller patches, ideally one functional change per patch. We need this
even for an RFC for the sake of reviews.
On Wed, Aug 06, 2025 at 06:09:54PM +0900, Akihiko Odaki wrote:
> +static u64 kvm_pmu_get_pmc_value(struct kvm_vcpu *vcpu, u8 idx)
> {
> - struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
> + struct kvm_pmc *pmc = *kvm_vcpu_idx_to_pmc(vcpu, idx);
> u64 counter, reg, enabled, running;
> + unsigned int i;
>
> - reg = counter_index_to_reg(pmc->idx);
> + reg = counter_index_to_reg(idx);
> counter = __vcpu_sys_reg(vcpu, reg);
>
> /*
> * The real counter value is equal to the value of counter register plus
> * the value perf event counts.
> */
> - if (pmc->perf_event)
> - counter += perf_event_read_value(pmc->perf_event, &enabled,
> - &running);
> + if (pmc)
> + for (i = 0; i < pmc->nr_perf_events; i++)
> + counter += perf_event_read_value(pmc->perf_events[i],
> + &enabled, &running);
I'm concerned that this array of events concept you're introducing is
going to be error-prone. An approach that reallocates a new PMU event in
the case of a vCPU migrating to a new PMU implementation would be
desirable.
> +static void reset_sample_period(struct perf_event *perf_event)
> +{
> + struct kvm_pmc **pmc = perf_event->overflow_handler_context;
> + struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
> + struct arm_pmu *cpu_pmu = to_arm_pmu(perf_event->pmu);
> + u64 period;
> +
> + cpu_pmu->pmu.stop(perf_event, PERF_EF_UPDATE);
> +
> + /*
> + * Reset the sample period to the architectural limit,
> + * i.e. the point where the counter overflows.
> + */
> + period = compute_period(pmc, kvm_pmu_get_pmc_value(vcpu, (*pmc)->idx));
> +
> + local64_set(&perf_event->hw.period_left, 0);
> + perf_event->attr.sample_period = period;
> + perf_event->hw.sample_period = period;
> +
> + cpu_pmu->pmu.start(perf_event, PERF_EF_RELOAD);
> +}
No, we can't start calling into the internal driver interfaces. The fact
that we have a pointer to the PMU is an ugly hack and shouldn't be used
like this.
> @@ -725,8 +729,8 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
> attr.type = arm_pmu->pmu.type;
> attr.size = sizeof(attr);
> attr.pinned = 1;
> - attr.disabled = !kvm_pmu_counter_is_enabled(pmc);
> - attr.exclude_user = !kvm_pmc_counts_at_el0(pmc);
> + attr.disabled = !kvm_pmu_counter_is_enabled(vcpu, (*pmc)->idx);
> + attr.exclude_user = !kvm_pmc_counts_at_el0(vcpu, (*pmc)->idx);
> attr.exclude_hv = 1; /* Don't count EL2 events */
> attr.exclude_host = 1; /* Don't count host events */
> attr.config = eventsel;
Can we just special-case the fixed CPU cycle counter to use
PERF_TYPE_HARDWARE / PERF_COUNT_HW_CPU_CYCLES? That _should_ have the
intended effect of opening an event on the PMU for this CPU.
> + /*
> + * If we have a filter in place and that the event isn't allowed, do
> + * not install a perf event either.
> + */
> + if (vcpu->kvm->arch.pmu_filter &&
> + !test_bit(eventsel, vcpu->kvm->arch.pmu_filter))
> + return;
> +
> + if (arm_pmu) {
> + *pmc = kvm_pmu_alloc_pmc(idx, 1);
> + if (!*pmc)
> + goto err;
> +
> + kvm_pmu_create_perf_event(pmc, arm_pmu, eventsel);
> + } else {
> + guard(mutex)(&arm_pmus_lock);
This is a system-wide lock, the need for which is eliminated if you go
for the reallocation approach I mention.
> +static int kvm_arm_pmu_v3_set_pmu_composition(struct kvm_vcpu *vcpu)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + struct arm_pmu_entry *entry;
> + struct arm_pmu *arm_pmu;
> +
> + lockdep_assert_held(&kvm->arch.config_lock);
> +
> + if (kvm_vm_has_ran_once(kvm) ||
> + (kvm->arch.pmu_filter && !kvm->arch.nr_composed_host_pmus))
> + return -EBUSY;
I'm not sure there's much value in preventing the user from configuring
the PMU event filter. Even in the case of the fixed CPU cycle counter we
allow userspace to filter the event.
It is much more important to have mutual exclusion between this UAPI and
userspace explicitly selecting a PMU implementation.
> @@ -1223,6 +1328,8 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>
> return kvm_arm_pmu_v3_set_nr_counters(vcpu, n);
> }
> + case KVM_ARM_VCPU_PMU_V3_COMPOSITION:
> + return kvm_arm_pmu_v3_set_pmu_composition(vcpu);
I'd prefer naming this something like 'KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY'.
We will have the fixed instruction counter eventually which is another
event we could potentially provide system-wide.
Thanks,
Oliver
Powered by blists - more mailing lists