[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0a41657e-a52c-43c9-9b73-89fd73a376c7@arm.com>
Date: Mon, 2 Sep 2024 19:47:18 +0100
From: Robin Murphy <robin.murphy@....com>
To: Will Deacon <will@...nel.org>
Cc: mark.rutland@....com, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, jialong.yang@...ngroup.cn
Subject: Re: [PATCH v3 2/3] perf: Add driver for Arm NI-700 interconnect PMU
On 02/09/2024 3:47 pm, Will Deacon wrote:
> Hi Robin,
>
> On Fri, Aug 30, 2024 at 06:19:34PM +0100, Robin Murphy wrote:
>> The Arm NI-700 Network-on-Chip Interconnect has a relatively
>> straightforward design with a hierarchy of voltage, power, and clock
>> domains, where each clock domain then contains a number of interface
>> units and a PMU which can monitor events thereon. As such, it begets a
>> relatively straightforward driver to interface those PMUs with perf.
>>
>> Even more so than with arm-cmn, users will require detailed knowledge of
>> the wider system topology in order to meaningfully analyse anything,
>> since the interconnect itself cannot know what lies beyond the boundary
>> of each inscrutably-numbered interface. Given that, for now they are
>> also expected to refer to the NI-700 documentation for the relevant
>> event IDs to provide as well. An identifier is implemented so we can
>> come back and add jevents if anyone really wants to.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@....com>
>>
>> ---
>> v2:
>> - Add basic usage documentation
>> - Use __counted_by attribute
>> - Make group validation logic clearer (and drop PMU type check
>> which perf_event_open() already takes care of)
>> - Add retry limit to arm_ni_read_ccnt()
>> v3:
>> - Update .remove to return void
>> - Fix group leader validation and make the naming clearer
>> - Drop NUMA_NO_NODE check for CPU online (the only way that could
>> actually pass both other migration conditions is if the NUMA info
>> is so messed up that it's not worth worrying about anyway)
>
> Thanks, this is looking pretty good now. I just have a few random comments
> based on another read-through of the code.
>
>> diff --git a/Documentation/admin-guide/perf/arm-ni.rst b/Documentation/admin-guide/perf/arm-ni.rst
>> new file mode 100644
>> index 000000000000..3cd7d0f75f0f
>> --- /dev/null
>> +++ b/Documentation/admin-guide/perf/arm-ni.rst
>> @@ -0,0 +1,17 @@
>> +====================================
>> +Arm Network-on Chip Interconnect PMU
>> +====================================
>> +
>> +NI-700 and friends implement a distinct PMU for each clock domain within the
>> +interconnect. Correspondingly, the driver exposes multiple PMU devices named
>> +arm_ni_<x>_cd_<y>, where <x> is an (abritrary) instance identifier and <y> is
>
> typo: abritrary
Oops, fixed.
>> +the clock domain ID within that particular instance. If multiple NI instances
>> +exist within a system, the PMU devices can be correlated with the underlying
>> +hardware instance via sysfs parentage.
>> +
>> +Each PMU exposes base event aliases for the interface types present in its clock
>> +domain. These require qualifying with the "eventid" and "nodeid" parameters
>> +to specify the event code to count and the interface at which to count it
>> +(per the configured hardware ID as reflected in the xxNI_NODE_INFO register).
>> +The exception is the "cycles" alias for the PMU cycle counter, which is encoded
>> +with the PMU node type and needs no further qualification.
>
> [...]
>
>> +static ssize_t arm_ni_format_show(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + struct arm_ni_format_attr *fmt = container_of(attr, typeof(*fmt), attr);
>> + int lo = __ffs(fmt->field), hi = __fls(fmt->field);
>> +
>> + return sysfs_emit(buf, "config:%d-%d\n", lo, hi);
>> +}
>
> Nit: if you end up adding single-bit config fields in the future, this
> will quietly do the wrong thing. Maybe safe-guard the 'lo==hi' case (even
> if you just warn once and return without doing anything).
The counter-argument is that I don't foresee having any reason to add
single-bit config fields here in future, nor indeed config1 or config2
fields, so I intentionally pruned the would-be dead code while
copy-pasting this implementation from arm-cmn. Yes, if someone were to
make an incomplete change without paying attention or testing they could
introduce a bug, but when is that ever not true?
TBH I was originally hoping to have much of this boilerplate stuff
factored out into a common library beforehand (I also anticipate needing
the IRQ-sharing shenanigans here), but as ever, priorities got in the way...
>
>
> [...]
>
>> +static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_start)
>> +{
>> + struct arm_ni_cd *cd = ni->cds + node->id;
>> + const char *name;
>> + int err;
>> +
>> + cd->id = node->id;
>> + cd->num_units = node->num_components;
>> + cd->units = devm_kcalloc(ni->dev, cd->num_units, sizeof(*(cd->units)), GFP_KERNEL);
>> + if (!cd->units)
>> + return -ENOMEM;
>> +
>> + for (int i = 0; i < cd->num_units; i++) {
>> + u32 reg = readl_relaxed(node->base + NI_CHILD_PTR(i));
>> + void __iomem *unit_base = ni->base + reg;
>> + struct arm_ni_unit *unit = cd->units + i;
>> +
>> + reg = readl_relaxed(unit_base + NI_NODE_TYPE);
>> + unit->type = FIELD_GET(NI_NODE_TYPE_NODE_TYPE, reg);
>> + unit->id = FIELD_GET(NI_NODE_TYPE_NODE_ID, reg);
>> +
>> + switch (unit->type) {
>> + case NI_PMU:
>> + reg = readl_relaxed(unit_base + NI_PMCFGR);
>> + if (!reg) {
>> + dev_info(ni->dev, "No access to PMU %d\n", cd->id);
>> + devm_kfree(ni->dev, cd->units);
>> + return 0;
>> + }
>> + unit->ns = true;
>> + cd->pmu_base = unit_base;
>> + break;
>> + case NI_ASNI:
>> + case NI_AMNI:
>> + case NI_HSNI:
>> + case NI_HMNI:
>> + case NI_PMNI:
>> + unit->pmusela = unit_base + NI700_PMUSELA;
>> + writel_relaxed(1, unit->pmusela);
>> + if (readl_relaxed(unit->pmusela) != 1)
>> + dev_info(ni->dev, "No access to node 0x%04x%04x\n", unit->id, unit->type);
>> + else
>> + unit->ns = true;
>> + break;
>> + default:
>> + /*
>> + * e.g. FMU - thankfully bits 3:2 of FMU_ERR_FR0 are RES0 so
>> + * can't alias any of the leaf node types we're looking for.
>> + */
>> + dev_dbg(ni->dev, "Mystery node 0x%04x%04x\n", unit->id, unit->type);
>> + break;
>> + }
>> + }
>> +
>> + res_start += cd->pmu_base - ni->base;
>> + if (!devm_request_mem_region(ni->dev, res_start, SZ_4K, dev_name(ni->dev))) {
>> + dev_err(ni->dev, "Failed to request PMU region 0x%llx\n", res_start);
>> + return -EBUSY;
>> + }
>> +
>> + writel_relaxed(NI_PMCR_RESET_CCNT | NI_PMCR_RESET_EVCNT,
>> + cd->pmu_base + NI_PMCR);
>> + writel_relaxed(U32_MAX, cd->pmu_base + NI_PMCNTENCLR);
>> + writel_relaxed(U32_MAX, cd->pmu_base + NI_PMOVSCLR);
>> + writel_relaxed(U32_MAX, cd->pmu_base + NI_PMINTENSET);
>> +
>> + cd->irq = platform_get_irq(to_platform_device(ni->dev), cd->id);
>> + if (cd->irq < 0)
>> + return cd->irq;
>> +
>> + err = devm_request_irq(ni->dev, cd->irq, arm_ni_handle_irq,
>> + IRQF_NOBALANCING | IRQF_NO_THREAD,
>> + dev_name(ni->dev), cd);
>> + if (err)
>> + return err;
>> +
>> + cd->cpu = cpumask_local_spread(0, dev_to_node(ni->dev));
>> + cd->pmu = (struct pmu) {
>> + .module = THIS_MODULE,
>> + .parent = ni->dev,
>> + .attr_groups = arm_ni_attr_groups,
>> + .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
>> + .task_ctx_nr = perf_invalid_context,
>> + .pmu_enable = arm_ni_pmu_enable,
>> + .pmu_disable = arm_ni_pmu_disable,
>> + .event_init = arm_ni_event_init,
>> + .add = arm_ni_event_add,
>> + .del = arm_ni_event_del,
>> + .start = arm_ni_event_start,
>> + .stop = arm_ni_event_stop,
>> + .read = arm_ni_event_read,
>> + };
>> +
>> + name = devm_kasprintf(ni->dev, GFP_KERNEL, "arm_ni_%d_cd_%d", ni->id, cd->id);
>> + if (!name)
>> + return -ENOMEM;
>> +
>> + err = cpuhp_state_add_instance(arm_ni_hp_state, &cd->cpuhp_node);
>> + if (err)
>> + return err;
>
> What happens if there's a CPU hotplug operation here? Can we end up calling
> perf_pmu_migrate_context() concurrently with perf_pmu_register()?
Yes. Alternatively we could register the PMU before the hotplug handler,
then potentially miss a hotplug event and leave a user-visible PMU
associated with an invalid CPU. This is a known issue for all system PMU
drivers, and the conclusion 5 years ago was that it's impractical to
close this race from outside perf core itself[1][2].
>> + return perf_pmu_register(&cd->pmu, name, -1);
>
> Clean up the cpuhp instance if this fails?
Indeed, not sure how that managed to escape...
Thanks,
Robin.
[1]
https://lore.kernel.org/linux-arm-kernel/cover.1549299188.git.robin.murphy@arm.com/
[2]
https://lore.kernel.org/linux-arm-kernel/cover.1554310292.git.robin.murphy@arm.com/
>> +static void arm_ni_remove(struct platform_device *pdev)
>> +{
>> + struct arm_ni *ni = platform_get_drvdata(pdev);
>> +
>> + for (int i = 0; i < ni->num_cds; i++) {
>> + struct arm_ni_cd *cd = ni->cds + i;
>> +
>> + if (!cd->pmu_base)
>> + continue;
>> +
>> + writel_relaxed(0, cd->pmu_base + NI_PMCR);
>> + writel_relaxed(U32_MAX, cd->pmu_base + NI_PMINTENCLR);
>> + perf_pmu_unregister(&cd->pmu);
>
> Similarly here, it feels like a CPU hotplug operation could cause problems
> here.
>
>> + cpuhp_state_remove_instance_nocalls(arm_ni_hp_state, &cd->cpuhp_node);
>> + }
>> +}
>
> Will
Powered by blists - more mailing lists