lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <gsnt4iww3406.fsf@coltonlewis-kvm.c.googlers.com>
Date: Tue, 03 Jun 2025 21:32:41 +0000
From: Colton Lewis <coltonlewis@...gle.com>
To: Oliver Upton <oliver.upton@...ux.dev>
Cc: kvm@...r.kernel.org, pbonzini@...hat.com, corbet@....net, 
	linux@...linux.org.uk, catalin.marinas@....com, will@...nel.org, 
	maz@...nel.org, joey.gouly@....com, suzuki.poulose@....com, 
	yuzenghui@...wei.com, mark.rutland@....com, shuah@...nel.org, 
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev, 
	linux-perf-users@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH 06/17] KVM: arm64: Introduce method to partition the PMU

Oliver Upton <oliver.upton@...ux.dev> writes:

> On Mon, Jun 02, 2025 at 07:26:51PM +0000, Colton Lewis wrote:
>>   static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
>>   {
>> +	u8 hpmn = vcpu->kvm->arch.arm_pmu->hpmn;
>> +
>>   	preempt_disable();

>>   	/*
>>   	 * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK
>>   	 * to disable guest access to the profiling and trace buffers
>>   	 */
>> -	vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN,
>> -					 *host_data_ptr(nr_event_counters));
>> -	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
>> +	vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn);
>> +	vcpu->arch.mdcr_el2 |= (MDCR_EL2_HPMD |
>> +				MDCR_EL2_TPM |

> This isn't safe, as there's no guarantee that kvm_arch::arm_pmu is
> pointing that the PMU for this CPU. KVM needs to derive HPMN from some
> per-CPU state, not anything tied to the VM/vCPU.

I'm confused. Isn't this function preparing to run the vCPU on this
CPU? Why would it be pointing at a different PMU?

And HPMN is something that we only want set when running a vCPU, so
there isn't any per-CPU state saying it should be anything but the
default value (number of counters) outside that context.

Unless you just mean I should check the number of counters again and
make sure HPMN is not an invalid value.

>> +/**
>> + * kvm_pmu_partition() - Partition the PMU
>> + * @pmu: Pointer to pmu being partitioned
>> + * @host_counters: Number of host counters to reserve
>> + *
>> + * Partition the given PMU by taking a number of host counters to
>> + * reserve and, if it is a valid reservation, recording the
>> + * corresponding HPMN value in the hpmn field of the PMU and clearing
>> + * the guest-reserved counters from the counter mask.
>> + *
>> + * Passing 0 for @host_counters has the effect of disabling  
>> partitioning.
>> + *
>> + * Return: 0 on success, -ERROR otherwise
>> + */
>> +int kvm_pmu_partition(struct arm_pmu *pmu, u8 host_counters)
>> +{
>> +	u8 nr_counters;
>> +	u8 hpmn;
>> +
>> +	if (!kvm_pmu_reservation_is_valid(host_counters))
>> +		return -EINVAL;
>> +
>> +	nr_counters = *host_data_ptr(nr_event_counters);
>> +	hpmn = kvm_pmu_hpmn(host_counters);
>> +
>> +	if (hpmn < nr_counters) {
>> +		pmu->hpmn = hpmn;
>> +		/* Inform host driver of available counters */
>> +		bitmap_clear(pmu->cntr_mask, 0, hpmn);
>> +		bitmap_set(pmu->cntr_mask, hpmn, nr_counters);
>> +		clear_bit(ARMV8_PMU_CYCLE_IDX, pmu->cntr_mask);
>> +		if (pmuv3_has_icntr())
>> +			clear_bit(ARMV8_PMU_INSTR_IDX, pmu->cntr_mask);
>> +
>> +		kvm_debug("Partitioned PMU with HPMN %u", hpmn);
>> +	} else {
>> +		pmu->hpmn = nr_counters;
>> +		bitmap_set(pmu->cntr_mask, 0, nr_counters);
>> +		set_bit(ARMV8_PMU_CYCLE_IDX, pmu->cntr_mask);
>> +		if (pmuv3_has_icntr())
>> +			set_bit(ARMV8_PMU_INSTR_IDX, pmu->cntr_mask);
>> +
>> +		kvm_debug("Unpartitioned PMU");
>> +	}
>> +
>> +	return 0;
>> +}

> Hmm... Just in terms of code organization I'm not sure I like having KVM
> twiddling with *host* support for PMUv3. Feels like the ARM PMU driver
> should own partitioning and KVM just takes what it can get.

Okay. I can move the code.

>> @@ -239,6 +245,13 @@ void kvm_host_pmu_init(struct arm_pmu *pmu)
>>   	if (!pmuv3_implemented(kvm_arm_pmu_get_pmuver_limit()))
>>   		return;

>> +	if (reserved_host_counters) {
>> +		if (kvm_pmu_partition_supported())
>> +			WARN_ON(kvm_pmu_partition(pmu, reserved_host_counters));
>> +		else
>> +			kvm_err("PMU Partition is not supported");
>> +	}
>> +

> Hasn't the ARM PMU been registered with perf at this point? Surely the
> driver wouldn't be very pleased with us ripping counters out from under
> its feet.

AFAICT nothing in perf registration cares about the number of counters
the PMU has. The PMUv3 driver tracks its own available counters through
cntr_mask and I modify that during partition.

Since this is still initialization of the PMU, I don't believe anything
has had a chance to use a counter yet that will be ripped away.

Aesthetically It makes since to change this if I move the partitioning
code to the PMUv3 driver, but I think it's inconsequential to the
function.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ