lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <e8324d9d-3756-41cf-a102-28572e302368@daynix.com> Date: Wed, 19 Mar 2025 17:37:29 +0900 From: Akihiko Odaki <akihiko.odaki@...nix.com> To: Oliver Upton <oliver.upton@...ux.dev> Cc: Marc Zyngier <maz@...nel.org>, Joey Gouly <joey.gouly@....com>, Suzuki K Poulose <suzuki.poulose@....com>, Zenghui Yu <yuzenghui@...wei.com>, Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>, Kees Cook <kees@...nel.org>, "Gustavo A. R. Silva" <gustavoars@...nel.org>, linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev, linux-kernel@...r.kernel.org, linux-hardening@...r.kernel.org, devel@...nix.com Subject: Re: [PATCH RFC] KVM: arm64: PMU: Use multiple host PMUs On 2025/03/19 16:34, Oliver Upton wrote: > Hi Akihiko, > > On Wed, Mar 19, 2025 at 03:33:46PM +0900, Akihiko Odaki wrote: >> Problem >> ------- >> >> arch/arm64/kvm/pmu-emul.c used to have a comment saying the follows: >>> The observant among you will notice that the supported_cpus >>> mask does not get updated for the default PMU even though it >>> is quite possible the selected instance supports only a >>> subset of cores in the system. This is intentional, and >>> upholds the preexisting behavior on heterogeneous systems >>> where vCPUs can be scheduled on any core but the guest >>> counters could stop working. >> >> Despite the reference manual says counters may not continuously >> incrementing, Windows is not robust enough to handle stopped PMCCNTR_EL0 >> and crashes with a division-by-zero error and it also crashes when the >> PMU is not present. >> >> To avoid such a problem, the userspace should pin the vCPU threads to >> pCPUs supported by one host PMU when initializing the vCPUs or specify >> the host PMU to use with KVM_ARM_VCPU_PMU_V3_SET_PMU after the >> initialization. However, QEMU/libvirt can pin vCPU threads only after the >> vCPUs are initialized. It also limits the pCPUs the guest can use even >> for VMMs that support proper pinning. >> >> Solution >> -------- >> >> Ideally, Windows should fix the division-by-zero error and QEMU/libvirt >> should support pinning better, but neither of them are going to happen >> anytime soon. >> >> To allow running Windows on QEMU/libvirt or with heterogeneous cores, >> combine all host PMUs necessary to cover the cores vCPUs can run and >> keep PMCCNTR_EL0 working. > > I'm extremely uneasy about making this a generalized solution. PMUs are > deeply tied to the microarchitecture of a particular implementation, and > that isn't something we can abstract away from the guest in KVM. > > For example, you could have an event ID that counts on only a subset of > cores, or better yet an event that counts something completely different > depending on where a vCPU lands.> > I do appreciate the issue that you're trying to solve. > > The good news though is that the fixed PMU cycle counter is the only > thing guaranteed to be present in any PMUv3 implementation. Since > that's the only counter Windows actually needs, perhaps we could > special-case this in KVM. > > I have the following (completely untested) patch, do you want to give it > a try? There's still going to be observable differences between PMUs > (e.g. CPU frequency) but at least it should get things booting. I don't think it will work, unfortunately. perf_init_event() binds a perf event to a particular PMU so the event will stop working when the thread migrates away. It should also be the reason why the perf program creates an event for each PMU. tools/perf/Documentation/intel-hybrid.txt has more descriptions. Allowing to enable more than one counter and/or an event type other than the cycle counter is not the goal. Enabling another event type may result in a garbage value, but I don't think it's worse than the current situation where the count stays zero; please tell me if I miss something. There is still room for improvement. Returning a garbage value may not be worse than returning zero, but counters and event types not supported by some cores shouldn't be advertised as available in the first place. More concretely: - The vCPU should be limited to run only on cores covered by PMUs when KVM_ARM_VCPU_PMU_V3 is set. - PMCR_EL0.N advertised to the guest should be the minimum of ones of host PMUs. - PMCEID0_EL0 and PMCEID1_EL0 advertised to the guest should be the result of the AND operations of ones of host PMUs. Special-casing the cycle counter may make sense if we are going to fix the advertised values of PMCR_EL0.N, PMCEID0_EL0, and PMCEID1_EL0. PMCR_EL0.N as we can simply return zero for these registers. We can also prevent enabling a counter that returns zero or a garbage value. Do you think it's worth fixing these registers? If so, I'll do that by special-casing the cycle counter. Regards, Akihiko Odaki > > Thanks, > Oliver > > --- > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c > index a1bc10d7116a..913a7bab50b5 100644 > --- a/arch/arm64/kvm/pmu-emul.c > +++ b/arch/arm64/kvm/pmu-emul.c > @@ -724,14 +724,21 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc) > return; > > memset(&attr, 0, sizeof(struct perf_event_attr)); > - attr.type = arm_pmu->pmu.type; > + > + if (pmc->idx == ARMV8_PMU_CYCLE_IDX) { > + attr.type = PERF_TYPE_HARDWARE; > + attr.config = PERF_COUNT_HW_CPU_CYCLES; > + } else { > + attr.type = arm_pmu->pmu.type; > + attr.config = eventsel; > + } > + > attr.size = sizeof(attr); > attr.pinned = 1; > attr.disabled = !kvm_pmu_counter_is_enabled(pmc); > attr.exclude_user = !kvm_pmc_counts_at_el0(pmc); > attr.exclude_hv = 1; /* Don't count EL2 events */ > attr.exclude_host = 1; /* Don't count host events */ > - attr.config = eventsel; > > /* > * Filter events at EL1 (i.e. vEL2) when in a hyp context based on the
Powered by blists - more mailing lists