lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e33c21eb-d4d3-4c38-aab8-60399c7ae210@rsg.ci.i.u-tokyo.ac.jp>
Date: Sat, 9 Aug 2025 15:15:40 +0900
From: Akihiko Odaki <odaki@....ci.i.u-tokyo.ac.jp>
To: Oliver Upton <oliver.upton@...ux.dev>
Cc: Marc Zyngier <maz@...nel.org>, Joey Gouly <joey.gouly@....com>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        Zenghui Yu
 <yuzenghui@...wei.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>, Kees Cook <kees@...nel.org>,
        "Gustavo A. R. Silva" <gustavoars@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>, Jonathan Corbet <corbet@....net>,
        Shuah Khan <shuah@...nel.org>, linux-arm-kernel@...ts.infradead.org,
        kvmarm@...ts.linux.dev, linux-kernel@...r.kernel.org,
        linux-hardening@...r.kernel.org, devel@...nix.com, kvm@...r.kernel.org,
        linux-doc@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH RFC v2 1/2] KVM: arm64: PMU: Introduce
 KVM_ARM_VCPU_PMU_V3_COMPOSITION

On 2025/08/09 8:08, Oliver Upton wrote:
> On Thu, Aug 07, 2025 at 11:06:21PM +0900, Akihiko Odaki wrote:
>>> The only cross-PMU events we will support are the fixed counters, my
>>> strong preference is that we do not reverse-map architectural events to
>>> generic perf events for all counters.
>>
>> I wonder if there is a benefit to special case PERF_COUNT_HW_CPU_CYCLES
>> then; the current logic of kvm_map_pmu_event() looks sufficient for me.
> 
> I'd rather we just use the generic perf events and let the driver remap
> things on our behalf. These are fixed counters, using constant events
> feels like the right way to go about that.
> 
> kvm_map_pmu_event() is trying to solve a slightly different problem
> where we need to map programmable PMUv3 events into a non-PMUv3 event
> space, like on the M1 PMU.

It is currently also used to map non-programmable PMUv3 events.

I want to understand the motivation better. The current procedure to 
determine the config value is as follows:
1) If the register is PMCCFILTR_EL0:
    a) eventsel = ARMV8_PMUV3_PERFCTR_CPU_CYCLES.
2) If the register is not PMCCFILTR_EL0:
    a) Derive eventsel by masking the register value.
3) If map_pmuv3_event() exists:
    a) The config value is map_pmuv3_event(eventsel).
4) If map_pmuv3_event() does not exist:
    a) The config value is eventsel.

If we use PERF_TYPE_HARDWARE / PERF_COUNT_HW_CPU_CYCLES, the procedure 
will look like the following:
1) If the register is PMCCFILTR_EL0:
    a) The config value is PERF_TYPE_HARDWARE / PERF_COUNT_HW_CPU_CYCLES.
2) If the reigster is not PMCCFILTR_EL0:
    a) Derive eventsel by masking the register value.
    b) If map_pmuv3_event() exists:
       i) The config value is map_pmuv3_event(eventsel).
    c) if map_pmuv3_event() does not exist,
       i) The config value is eventsel.

It does not seem that using PERF_TYPE_HARDWARE / 
PERF_COUNT_HW_CPU_CYCLES simplifies the procedure.

> 
>>> This isn't what I meant. What I mean is that userspace either can use
>>> the SET_PMU ioctl or the COMPOSITION ioctl. Once one of them has been
>>> used the other ioctl returns an error.
>>>
>>> We're really bad at getting ioctl ordering / interleaving right and
>>> syzkaller has a habit of finding these mistakes. There's zero practical
>>> value in using both of these ioctls on the same VM, let's prevent it.
>>
>> The corresponding RFC series for QEMU uses KVM_ARM_VCPU_PMU_V3_SET_PMU to
>> probe host PMUs, and falls back to KVM_ARM_VCPU_PMU_V3_COMPOSITION if none
>> covers all CPUs. Switching between SET_PMU and COMPOSITION is useful during
>> such probing.
>>
>> COMPOSITION is designed to behave like just another host PMU that is set
>> with SET_PMU. SET_PMU allows setting a different host PMU even if SET_PMU
>> has already been invoked so it is also allowed to set a host PMU even if
>> COMPOSITION has already been invoked, maintaining consistency with
>> non-composed PMUs.
>>
>> You can find the QEMU patch at:
>> https://lore.kernel.org/qemu-devel/20250806-kvmq-v1-1-d1d50b7058cd@rsg.ci.i.u-tokyo.ac.jp/
>>
>> (look up KVM_ARM_VCPU_PMU_V3_SET_PMU for the probing code)
> 
> Having both of these attributes return success when probed with
> KVM_HAS_DEVICE_ATTR is fine; what I mean is that once KVM_SET_DEVICE_ATTR
> has been called on an attribute the other fails.

By probing, I meant checking if a host PMU is compatible with KVM.

More concretely, QEMU implements the following procedure to detect a PMU 
backend compatible with all host CPUs:

1) Traverse /sys/bus/event_source/devices
    a) Check if the device has the cpus and type attributes.
       If it doesn't, skip it.
    b) Try to set the device's type with KVM_ARM_VCPU_PMU_V3_SET_PMU.
       If successful, the device is compatible with KVM.
    c) Check if the device's cpus cover all host CPUs.
       If it does, use it with KVM_ARM_VCPU_PMU_V3_SET_PMU.

2) Check if the union of the cpus attributes of compatible devices
    cover all CPUs. If it does, use KVM_ARM_VCPU_PMU_V3_COMPOSITION.

3) If it failed to find a usable backend until this step,
    there is no PMU backend compatible with all host CPUs.

Here, 1b) calls KVM_SET_DEVICE_ATTR with KVM_ARM_VCPU_PMU_V3_SET_PMU 
during probing.

> 
>>> On a system that has FEAT_PMUv3_ICNTR, userspace can still use this
>>> ioctl and explicitly de-feature ICNTR by writing to the ID register
>>> after initialization.
>>
>> Now I understand better.
>>
>> Currently, KVM_ARM_VCPU_PMU_V3_COMPOSITION sets supported_cpus to ones that
>> have cycle counters compatible with PMU emulation.
>>
>> If FEAT_PMUv3_ICNTR is set to the ID register, I guess
>> KVM_ARM_VCPU_PMU_V3_COMPOSITION will set supported_cpus to ones that have
>> compatible cycle and instruction counters. If so, the naming
>> KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY indeed makes sense.
> 
> Perfect. Ideally SOC vendors do the sensible thing and ensure that
> FEAT_PMUv3_ICNTR is consistent on all implementations in a machine. We
> will hide the feature in KVM if it is not.

M1 PMU also implements a fixed instruction counter, fortunately on all 
CPUs. I hope they continue to do so (and ideally they implement 
FEAT_PMUv3 and FEAT_PMUv3_ICNTR).

Regards,
Akihiko Odaki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ