linux-kernel - Re: [PATCH v3 2/3] perf: Add driver for Arm NI-700 interconnect PMU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0a41657e-a52c-43c9-9b73-89fd73a376c7@arm.com>
Date: Mon, 2 Sep 2024 19:47:18 +0100
From: Robin Murphy <robin.murphy@....com>
To: Will Deacon <will@...nel.org>
Cc: mark.rutland@....com, linux-arm-kernel@...ts.infradead.org,
 linux-kernel@...r.kernel.org, jialong.yang@...ngroup.cn
Subject: Re: [PATCH v3 2/3] perf: Add driver for Arm NI-700 interconnect PMU

On 02/09/2024 3:47 pm, Will Deacon wrote:
> Hi Robin,
> 
> On Fri, Aug 30, 2024 at 06:19:34PM +0100, Robin Murphy wrote:
>> The Arm NI-700 Network-on-Chip Interconnect has a relatively
>> straightforward design with a hierarchy of voltage, power, and clock
>> domains, where each clock domain then contains a number of interface
>> units and a PMU which can monitor events thereon. As such, it begets a
>> relatively straightforward driver to interface those PMUs with perf.
>>
>> Even more so than with arm-cmn, users will require detailed knowledge of
>> the wider system topology in order to meaningfully analyse anything,
>> since the interconnect itself cannot know what lies beyond the boundary
>> of each inscrutably-numbered interface. Given that, for now they are
>> also expected to refer to the NI-700 documentation for the relevant
>> event IDs to provide as well. An identifier is implemented so we can
>> come back and add jevents if anyone really wants to.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@....com>
>>
>> ---
>> v2:
>>   - Add basic usage documentation
>>   - Use __counted_by attribute
>>   - Make group validation logic clearer (and drop PMU type check
>>     which perf_event_open() already takes care of)
>>   - Add retry limit to arm_ni_read_ccnt()
>> v3:
>>   - Update .remove to return void
>>   - Fix group leader validation and make the naming clearer
>>   - Drop NUMA_NO_NODE check for CPU online (the only way that could
>>     actually pass both other migration conditions is if the NUMA info
>>     is so messed up that it's not worth worrying about anyway)
> 
> Thanks, this is looking pretty good now. I just have a few random comments
> based on another read-through of the code.
> 
>> diff --git a/Documentation/admin-guide/perf/arm-ni.rst b/Documentation/admin-guide/perf/arm-ni.rst
>> new file mode 100644
>> index 000000000000..3cd7d0f75f0f
>> --- /dev/null
>> +++ b/Documentation/admin-guide/perf/arm-ni.rst
>> @@ -0,0 +1,17 @@
>> +====================================
>> +Arm Network-on Chip Interconnect PMU
>> +====================================
>> +
>> +NI-700 and friends implement a distinct PMU for each clock domain within the
>> +interconnect. Correspondingly, the driver exposes multiple PMU devices named
>> +arm_ni_<x>_cd_<y>, where <x> is an (abritrary) instance identifier and <y> is
> 
> typo: abritrary

Oops, fixed.

>> +the clock domain ID within that particular instance. If multiple NI instances
>> +exist within a system, the PMU devices can be correlated with the underlying
>> +hardware instance via sysfs parentage.
>> +
>> +Each PMU exposes base event aliases for the interface types present in its clock
>> +domain. These require qualifying with the "eventid" and "nodeid" parameters
>> +to specify the event code to count and the interface at which to count it
>> +(per the configured hardware ID as reflected in the xxNI_NODE_INFO register).
>> +The exception is the "cycles" alias for the PMU cycle counter, which is encoded
>> +with the PMU node type and needs no further qualification.
> 
> [...]
> 
>> +static ssize_t arm_ni_format_show(struct device *dev,
>> +				  struct device_attribute *attr, char *buf)
>> +{
>> +	struct arm_ni_format_attr *fmt = container_of(attr, typeof(*fmt), attr);
>> +	int lo = __ffs(fmt->field), hi = __fls(fmt->field);
>> +
>> +	return sysfs_emit(buf, "config:%d-%d\n", lo, hi);
>> +}
> 
> Nit: if you end up adding single-bit config fields in the future, this
> will quietly do the wrong thing. Maybe safe-guard the 'lo==hi' case (even
> if you just warn once and return without doing anything).

The counter-argument is that I don't foresee having any reason to add 
single-bit config fields here in future, nor indeed config1 or config2 
fields, so I intentionally pruned the would-be dead code while 
copy-pasting this implementation from arm-cmn. Yes, if someone were to 
make an incomplete change without paying attention or testing they could 
introduce a bug, but when is that ever not true?

TBH I was originally hoping to have much of this boilerplate stuff 
factored out into a common library beforehand (I also anticipate needing 
the IRQ-sharing shenanigans here), but as ever, priorities got in the way...

> 
> 
> [...]
> 
>> +static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_start)
>> +{
>> +	struct arm_ni_cd *cd = ni->cds + node->id;
>> +	const char *name;
>> +	int err;
>> +
>> +	cd->id = node->id;
>> +	cd->num_units = node->num_components;
>> +	cd->units = devm_kcalloc(ni->dev, cd->num_units, sizeof(*(cd->units)), GFP_KERNEL);
>> +	if (!cd->units)
>> +		return -ENOMEM;
>> +
>> +	for (int i = 0; i < cd->num_units; i++) {
>> +		u32 reg = readl_relaxed(node->base + NI_CHILD_PTR(i));
>> +		void __iomem *unit_base = ni->base + reg;
>> +		struct arm_ni_unit *unit = cd->units + i;
>> +
>> +		reg = readl_relaxed(unit_base + NI_NODE_TYPE);
>> +		unit->type = FIELD_GET(NI_NODE_TYPE_NODE_TYPE, reg);
>> +		unit->id = FIELD_GET(NI_NODE_TYPE_NODE_ID, reg);
>> +
>> +		switch (unit->type) {
>> +		case NI_PMU:
>> +			reg = readl_relaxed(unit_base + NI_PMCFGR);
>> +			if (!reg) {
>> +				dev_info(ni->dev, "No access to PMU %d\n", cd->id);
>> +				devm_kfree(ni->dev, cd->units);
>> +				return 0;
>> +			}
>> +			unit->ns = true;
>> +			cd->pmu_base = unit_base;
>> +			break;
>> +		case NI_ASNI:
>> +		case NI_AMNI:
>> +		case NI_HSNI:
>> +		case NI_HMNI:
>> +		case NI_PMNI:
>> +			unit->pmusela = unit_base + NI700_PMUSELA;
>> +			writel_relaxed(1, unit->pmusela);
>> +			if (readl_relaxed(unit->pmusela) != 1)
>> +				dev_info(ni->dev, "No access to node 0x%04x%04x\n", unit->id, unit->type);
>> +			else
>> +				unit->ns = true;
>> +			break;
>> +		default:
>> +			/*
>> +			 * e.g. FMU - thankfully bits 3:2 of FMU_ERR_FR0 are RES0 so
>> +			 * can't alias any of the leaf node types we're looking for.
>> +			 */
>> +			dev_dbg(ni->dev, "Mystery node 0x%04x%04x\n", unit->id, unit->type);
>> +			break;
>> +		}
>> +	}
>> +
>> +	res_start += cd->pmu_base - ni->base;
>> +	if (!devm_request_mem_region(ni->dev, res_start, SZ_4K, dev_name(ni->dev))) {
>> +		dev_err(ni->dev, "Failed to request PMU region 0x%llx\n", res_start);
>> +		return -EBUSY;
>> +	}
>> +
>> +	writel_relaxed(NI_PMCR_RESET_CCNT | NI_PMCR_RESET_EVCNT,
>> +		       cd->pmu_base + NI_PMCR);
>> +	writel_relaxed(U32_MAX, cd->pmu_base + NI_PMCNTENCLR);
>> +	writel_relaxed(U32_MAX, cd->pmu_base + NI_PMOVSCLR);
>> +	writel_relaxed(U32_MAX, cd->pmu_base + NI_PMINTENSET);
>> +
>> +	cd->irq = platform_get_irq(to_platform_device(ni->dev), cd->id);
>> +	if (cd->irq < 0)
>> +		return cd->irq;
>> +
>> +	err = devm_request_irq(ni->dev, cd->irq, arm_ni_handle_irq,
>> +			       IRQF_NOBALANCING | IRQF_NO_THREAD,
>> +			       dev_name(ni->dev), cd);
>> +	if (err)
>> +		return err;
>> +
>> +	cd->cpu = cpumask_local_spread(0, dev_to_node(ni->dev));
>> +	cd->pmu = (struct pmu) {
>> +		.module = THIS_MODULE,
>> +		.parent = ni->dev,
>> +		.attr_groups = arm_ni_attr_groups,
>> +		.capabilities = PERF_PMU_CAP_NO_EXCLUDE,
>> +		.task_ctx_nr = perf_invalid_context,
>> +		.pmu_enable = arm_ni_pmu_enable,
>> +		.pmu_disable = arm_ni_pmu_disable,
>> +		.event_init = arm_ni_event_init,
>> +		.add = arm_ni_event_add,
>> +		.del = arm_ni_event_del,
>> +		.start = arm_ni_event_start,
>> +		.stop = arm_ni_event_stop,
>> +		.read = arm_ni_event_read,
>> +	};
>> +
>> +	name = devm_kasprintf(ni->dev, GFP_KERNEL, "arm_ni_%d_cd_%d", ni->id, cd->id);
>> +	if (!name)
>> +		return -ENOMEM;
>> +
>> +	err = cpuhp_state_add_instance(arm_ni_hp_state, &cd->cpuhp_node);
>> +	if (err)
>> +		return err;
> 
> What happens if there's a CPU hotplug operation here? Can we end up calling
> perf_pmu_migrate_context() concurrently with perf_pmu_register()?

Yes. Alternatively we could register the PMU before the hotplug handler, 
then potentially miss a hotplug event and leave a user-visible PMU 
associated with an invalid CPU. This is a known issue for all system PMU 
drivers, and the conclusion 5 years ago was that it's impractical to 
close this race from outside perf core itself[1][2].

>> +	return perf_pmu_register(&cd->pmu, name, -1);
> 
> Clean up the cpuhp instance if this fails?

Indeed, not sure how that managed to escape...

Thanks,
Robin.

[1] 
https://lore.kernel.org/linux-arm-kernel/cover.1549299188.git.robin.murphy@arm.com/
[2] 
https://lore.kernel.org/linux-arm-kernel/cover.1554310292.git.robin.murphy@arm.com/

>> +static void arm_ni_remove(struct platform_device *pdev)
>> +{
>> +	struct arm_ni *ni = platform_get_drvdata(pdev);
>> +
>> +	for (int i = 0; i < ni->num_cds; i++) {
>> +		struct arm_ni_cd *cd = ni->cds + i;
>> +
>> +		if (!cd->pmu_base)
>> +			continue;
>> +
>> +		writel_relaxed(0, cd->pmu_base + NI_PMCR);
>> +		writel_relaxed(U32_MAX, cd->pmu_base + NI_PMINTENCLR);
>> +		perf_pmu_unregister(&cd->pmu);
> 
> Similarly here, it feels like a CPU hotplug operation could cause problems
> here.
> 
>> +		cpuhp_state_remove_instance_nocalls(arm_ni_hp_state, &cd->cpuhp_node);
>> +	}
>> +}
> 
> Will