linux-kernel - Re: [PATCH v2 2/3] perf: Add driver for Arm NI-700 interconnect PMU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zr40Y_9QyRYpG_bQ@J2N7QTR9R3.cambridge.arm.com>
Date: Thu, 15 Aug 2024 18:04:36 +0100
From: Mark Rutland <mark.rutland@....com>
To: Robin Murphy <robin.murphy@....com>
Cc: will@...nel.org, linux-arm-kernel@...ts.infradead.org,
	linux-kernel@...r.kernel.org, jialong.yang@...ngroup.cn
Subject: Re: [PATCH v2 2/3] perf: Add driver for Arm NI-700 interconnect PMU

On Wed, Jul 10, 2024 at 05:09:34PM +0100, Robin Murphy wrote:
> The Arm NI-700 Network-on-Chip Interconnect has a relatively
> straightforward design with a hierarchy of voltage, power, and clock
> domains, where each clock domain then contains a number of interface
> units and a PMU which can monitor events thereon. As such, it begets a
> relatively straightforward driver to interface those PMUs with perf.
> 
> Even more so than with arm-cmn, users will require detailed knowledge of
> the wider system topology in order to meaningfully analyse anything,
> since the interconnect itself cannot know what lies beyond the boundary
> of each inscrutably-numbered interface. Given that, for now they are
> also expected to refer to the NI-700 documentation for the relevant
> event IDs to provide as well. An identifier is implemented so we can
> come back and add jevents if anyone really wants to.

IIUC the relevant documentation would be the NI-700 TRM:

  https://developer.arm.com/documentation/101566/0203/?lang=en

... right?

[...]

> +====================================
> +Arm Network-on Chip Interconnect PMU
> +====================================
> +
> +NI-700 and friends implement a distinct PMU for each clock domain within the
> +interconnect. Correspondingly, the driver exposes multiple PMU devices named
> +arm_ni_<x>_cd_<y>, where <x> is an (abritrary) instance identifier and <y> is
> +the clock domain ID within that particular instance. If multiple NI instances
> +exist within a system, the PMU devices can be correlated with the underlying
> +hardware instance via sysfs parentage.

I suspect that name suffixing *might* confuse userspace in some cases,
since IIUC the perf tool tries to aggregate some_pmu_<number> instances,
and here it would presumably aggregate all the clock domains as a
arm_ni_<x>_cd PMU.

We should check how that behaves and speak to the userspace folk about
this; taking a step back we probably need a better way for determining
when things can or should be aggregated.

I assume we don't have HW to test on, but we should be able to fake up
the sysfs hierarchy and see how "perf list" treats that.

[...]

> +static int arm_ni_validate_group(struct perf_event *event)
> +{
> +	struct perf_event *sibling, *leader;
> +	struct arm_ni_val val = { 0 };
> +
> +	arm_ni_validate_event(event, &val);
> +
> +	leader = event->group_leader;
> +	if (leader != event && !arm_ni_validate_event(leader, &val))
> +		return - EINVAL;
> +
> +	for_each_sibling_event(sibling, leader) {
> +		if (!arm_ni_validate_event(sibling, &val))
> +			return - EINVAL;
> +	}
> +	return 0;
> +}

As a trivial nit, something has gone wrong with spacing and you have:

	return - EINVAL;

... rather than:

	return -EINVAL;

As a more substantial thing, this will trigger splats when lockdep is
enabled and an event is opened where event == event->group_leader,
because for_each_sibling_event(..., event) checks event->ctx->mutex is
held, and event->ctx isn't initialised until pmu::event_init() returns.

That's a latent bug in many PMU drivers at the moment, and is on my TODO
list to fix elsewhere. For now, can you make the above:

| static int arm_ni_validate_group(struct perf_event *event)
| {
| 	struct perf_event *sibling, *leader = event->group_leader;
| 	struct arm_ni_val val = { 0 };
| 
| 	if (event == leader)
| 		return 0;
|
| 	arm_ni_validate_event(event, &val);
| 	
| 	if (!arm_ni_validate_event(leader, &val))
| 		return -EINVAL;
| 
| 	for_each_sibling_event(sibling, leader) {
| 		if (!arm_ni_validate_event(sibling, &val))
| 			return -EINVAL;
| 	}
| 
| 	return 0;
| }

... where the early exit for the (event == leader) case will avoid the
bad call.

[...]

> +static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_start)
> +{
> +	struct arm_ni_cd *cd = ni->cds + node->id;
> +	const char *name;
> +	int err;

> +	cd->cpu = cpumask_local_spread(0, dev_to_node(ni->dev));

Can dev_to_node(ni->dev) return NUMA_NO_NODE, and if so, do we need to
error out here? ...

> +static int arm_ni_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
> +{
> +	struct arm_ni_cd *cd;
> +	int node;
> +
> +	cd = hlist_entry_safe(cpuhp_node, struct arm_ni_cd, cpuhp_node);
> +	node = dev_to_node(cd_to_ni(cd)->dev);
> +	if (node != NUMA_NO_NODE && cpu_to_node(cd->cpu) != node && cpu_to_node(cpu) == node)
> +		arm_ni_pmu_migrate(cd, cpu);
> +	return 0;
> +}

... since we expect to handle similar here ...

> +
> +static int arm_ni_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
> +{
> +	struct arm_ni_cd *cd;
> +	unsigned int target;
> +	int node;
> +
> +	cd = hlist_entry_safe(cpuhp_node, struct arm_ni_cd, cpuhp_node);
> +	if (cpu != cd->cpu)
> +		return 0;
> +
> +	node = dev_to_node(cd_to_ni(cd)->dev);
> +	target = cpumask_any_and_but(cpumask_of_node(node), cpu_online_mask, cpu);
> +	if (target >= nr_cpu_ids)
> +		target = cpumask_any_but(cpu_online_mask, cpu);
> +
> +	if (target < nr_cpu_ids)
> +		arm_ni_pmu_migrate(cd, target);
> +	return 0;
> +}

... though not here, and overall that seems inconsistent.

Mark.