[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e8324d9d-3756-41cf-a102-28572e302368@daynix.com>
Date: Wed, 19 Mar 2025 17:37:29 +0900
From: Akihiko Odaki <akihiko.odaki@...nix.com>
To: Oliver Upton <oliver.upton@...ux.dev>
Cc: Marc Zyngier <maz@...nel.org>, Joey Gouly <joey.gouly@....com>,
Suzuki K Poulose <suzuki.poulose@....com>, Zenghui Yu
<yuzenghui@...wei.com>, Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Kees Cook <kees@...nel.org>,
"Gustavo A. R. Silva" <gustavoars@...nel.org>,
linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev,
linux-kernel@...r.kernel.org, linux-hardening@...r.kernel.org,
devel@...nix.com
Subject: Re: [PATCH RFC] KVM: arm64: PMU: Use multiple host PMUs
On 2025/03/19 16:34, Oliver Upton wrote:
> Hi Akihiko,
>
> On Wed, Mar 19, 2025 at 03:33:46PM +0900, Akihiko Odaki wrote:
>> Problem
>> -------
>>
>> arch/arm64/kvm/pmu-emul.c used to have a comment saying the follows:
>>> The observant among you will notice that the supported_cpus
>>> mask does not get updated for the default PMU even though it
>>> is quite possible the selected instance supports only a
>>> subset of cores in the system. This is intentional, and
>>> upholds the preexisting behavior on heterogeneous systems
>>> where vCPUs can be scheduled on any core but the guest
>>> counters could stop working.
>>
>> Despite the reference manual says counters may not continuously
>> incrementing, Windows is not robust enough to handle stopped PMCCNTR_EL0
>> and crashes with a division-by-zero error and it also crashes when the
>> PMU is not present.
>>
>> To avoid such a problem, the userspace should pin the vCPU threads to
>> pCPUs supported by one host PMU when initializing the vCPUs or specify
>> the host PMU to use with KVM_ARM_VCPU_PMU_V3_SET_PMU after the
>> initialization. However, QEMU/libvirt can pin vCPU threads only after the
>> vCPUs are initialized. It also limits the pCPUs the guest can use even
>> for VMMs that support proper pinning.
>>
>> Solution
>> --------
>>
>> Ideally, Windows should fix the division-by-zero error and QEMU/libvirt
>> should support pinning better, but neither of them are going to happen
>> anytime soon.
>>
>> To allow running Windows on QEMU/libvirt or with heterogeneous cores,
>> combine all host PMUs necessary to cover the cores vCPUs can run and
>> keep PMCCNTR_EL0 working.
> > I'm extremely uneasy about making this a generalized solution. PMUs are
> deeply tied to the microarchitecture of a particular implementation, and
> that isn't something we can abstract away from the guest in KVM.
>
> For example, you could have an event ID that counts on only a subset of
> cores, or better yet an event that counts something completely different
> depending on where a vCPU lands.> > I do appreciate the issue that you're trying to solve.
>
> The good news though is that the fixed PMU cycle counter is the only
> thing guaranteed to be present in any PMUv3 implementation. Since
> that's the only counter Windows actually needs, perhaps we could
> special-case this in KVM.
>
> I have the following (completely untested) patch, do you want to give it
> a try? There's still going to be observable differences between PMUs
> (e.g. CPU frequency) but at least it should get things booting.
I don't think it will work, unfortunately. perf_init_event() binds a
perf event to a particular PMU so the event will stop working when the
thread migrates away.
It should also be the reason why the perf program creates an event for
each PMU. tools/perf/Documentation/intel-hybrid.txt has more descriptions.
Allowing to enable more than one counter and/or an event type other than
the cycle counter is not the goal. Enabling another event type may
result in a garbage value, but I don't think it's worse than the current
situation where the count stays zero; please tell me if I miss something.
There is still room for improvement. Returning a garbage value may not
be worse than returning zero, but counters and event types not supported
by some cores shouldn't be advertised as available in the first place.
More concretely:
- The vCPU should be limited to run only on cores covered by PMUs when
KVM_ARM_VCPU_PMU_V3 is set.
- PMCR_EL0.N advertised to the guest should be the minimum of ones of
host PMUs.
- PMCEID0_EL0 and PMCEID1_EL0 advertised to the guest should be the
result of the AND operations of ones of host PMUs.
Special-casing the cycle counter may make sense if we are going to fix
the advertised values of PMCR_EL0.N, PMCEID0_EL0, and PMCEID1_EL0.
PMCR_EL0.N as we can simply return zero for these registers. We can also
prevent enabling a counter that returns zero or a garbage value.
Do you think it's worth fixing these registers? If so, I'll do that by
special-casing the cycle counter.
Regards,
Akihiko Odaki
>
> Thanks,
> Oliver
>
> ---
> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> index a1bc10d7116a..913a7bab50b5 100644
> --- a/arch/arm64/kvm/pmu-emul.c
> +++ b/arch/arm64/kvm/pmu-emul.c
> @@ -724,14 +724,21 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
> return;
>
> memset(&attr, 0, sizeof(struct perf_event_attr));
> - attr.type = arm_pmu->pmu.type;
> +
> + if (pmc->idx == ARMV8_PMU_CYCLE_IDX) {
> + attr.type = PERF_TYPE_HARDWARE;
> + attr.config = PERF_COUNT_HW_CPU_CYCLES;
> + } else {
> + attr.type = arm_pmu->pmu.type;
> + attr.config = eventsel;
> + }
> +
> attr.size = sizeof(attr);
> attr.pinned = 1;
> attr.disabled = !kvm_pmu_counter_is_enabled(pmc);
> attr.exclude_user = !kvm_pmc_counts_at_el0(pmc);
> attr.exclude_hv = 1; /* Don't count EL2 events */
> attr.exclude_host = 1; /* Don't count host events */
> - attr.config = eventsel;
>
> /*
> * Filter events at EL1 (i.e. vEL2) when in a hyp context based on the
Powered by blists - more mailing lists