[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fef66164-1238-45e4-b70c-c565caa2cf75@kernel.org>
Date: Wed, 31 Jan 2024 10:38:39 +0100
From: Krzysztof Kozlowski <krzk@...nel.org>
To: Yang Jialong 杨佳龙 <jialong.yang@...ngroup.cn>,
Will Deacon <will@...nel.org>, Mark Rutland <mark.rutland@....com>
Cc: shenghui.qu@...ngroup.cn, ke.zhao@...ngroup.cn, zhijie.ren@...ngroup.cn,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v2] perf/hx_arm_ni: Support uncore ARM NI-700 PMU
On 31/01/2024 10:07, Yang Jialong 杨佳龙 wrote:
>
>
> 在 2024/1/31 15:59, Krzysztof Kozlowski 写道:
>> On 31/01/2024 08:08, JiaLong.Yang wrote:
>>> This code is based on uncore PMUs arm_smmuv3_pmu and arm-cmn.
>>> One ni-700 can have many clock domains. Each of them has only one PMU.
>>> Here one PMU corresponds to one 'struct ni_pmu' instance.
>>> PMU name will be ni_pmu_N_M, which N means different NI-700s and M means
>>> different PMU in one NI-700. If only one NI-700 found in NI-700, name will
>>> be ni_pmu_N.
>>> Node interface event name will be xxni_N_eventname, such as
>>> asni_0_rdreq_any. There are many kinds of type of nodes in one clock
>>> domain. Also means that there are many kinds of that in one PMU. So we
>>> distinguish them by xxni string. Besides, maybe there are many nodes
>>> have same type. So we have number N in event name.
>>> By ni_pmu_0_0/asni_0_rdreq_any/, we can pinpoint accurate bus traffic.
>>> Example1: perf stat -a -e ni_pmu_0_0/asni_0_rdreq_any/,ni_pmu_0_0/cycles/
>>> EXample2: perf stat -a -e ni_pmu_0_0/asni,id=0,event=0x0/
>>>
>>> Signed-off-by: JiaLong.Yang <jialong.yang@...ngroup.cn>
>>> ---
>>> v1 --> v2:
>>> 1. Submit MAINTANER Documentation/ files seperately.
>>
>> SEPARATE PATCHES, not patchsets. You have now checkpatch warnings
>> because of this...
>
> ...OK. But the MAINTANER file changing should be given in which one
> patches.
> I will submit patch v3 after talking and your permission.
>
>>
>>> 2. Delete some useless info printing.
>>> 3. Change print from pr_xxx to dev_xxx.
>>> 4. Fix more than 75 length log info.
>>> 5. Fix dts attribute pccs-id.
>>> 6. Fix generic name according to DT specification.
>>> 7. Some indentation.
>>> 8. Del of_match_ptr macro.
>>>
>>> drivers/perf/Kconfig | 11 +
>>> drivers/perf/Makefile | 1 +
>>> drivers/perf/hx_arm_ni.c | 1284 ++++++++++++++++++++++++++++++++++++++
>>> 3 files changed, 1296 insertions(+)
>>> create mode 100644 drivers/perf/hx_arm_ni.c
>>>
>>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>>> index ec6e0d9194a1..95ef8b13730f 100644
>>> --- a/drivers/perf/Kconfig
>>> +++ b/drivers/perf/Kconfig
>>> @@ -241,4 +241,15 @@ config CXL_PMU
>>>
>>> If unsure say 'm'.
>>>
>>> +config HX_ARM_NI_PMU
>>> + tristate "HX ARM NI-700 PMU"
>>> + depends on PPC_HX_C2000 && 64BIT
>>
>> 1. There is no PPC_HX_C2000.
>
> I have been used to using this macro. However this macro is not existed
> in mainline.
> I will replace it with ARM64. And del involved C code if OK.
>
> 64bit:
> __ffs(unsigned long) and __fls(unsigned long) will be wrong in 32bit. I
> pass a u64 argument.
One thing is where the code is supposed to run, second thing is compile
testing.
Why do you use __ffs, not __ffs64 which takes u64 if you really want
only 64bit argument? unsigned long != u64, so your code is not
architecture independent. You claim you wrote it on purpose as
non-architecture-independent, but then I claim it's a bug. We are
supposed to write code which is portable, as much as possible, assuming
it does not affect readability.
> struct ni_hw_perf_event will be big than limit.
> BUILD_BUG_ON(sizeof(struct ni_hw_perf_event) > offsetof(struct
> hw_perf_event, target));
And why do you need to use any of such code? Please open one of hundreds
of other drivers which work correctly on 32 and 64-bit platforms.
>
>> 2. Nothing justified dependency on 64bit. Drop or explain. Your previous
>> message did not provide real rationale.
>
> If ARM64, then drop.
..
..
>>> + /* Step2: Traverse all clock domains. */
>>> + for (cd_idx = 0; cd_idx < ni->cd_num; cd_idx++) {
>>> + cd = cd_arrays[cd_idx];
>>> +
>>> + num = ni_child_number(cd);
>>> + dev_dbg(dev, "The %dth clock domain has %d child nodes:", cd_idx, num);
>>> +
>>> + /* Omit pmu node */
>>> + ni_pmu = devm_kzalloc(dev, struct_size(ni_pmu, ev_src_nodes, num - 1),
>>> + GFP_KERNEL);
>>> + ni_pmu->ev_src_num = num - 1;
>>> +
>>> + if (!ni_pmu)
>>> + return -ENOMEM;
>>> +
>>> + num_idx = 0;
>>> + for (nd_idx = 0; nd_idx < num; nd_idx++) {
>>> + nd = ni_child_pointer(pbase, cd, nd_idx);
>>> +
>>> + node.base = nd;
>>> + node.node_type = ni_node_node_type(nd);
>>> +
>>> + if (unlikely(ni_node_type(nd) == NI_PMU))
>>> + ni_pmu->pmu_node = node;
>>> + else
>>> + ni_pmu->ev_src_nodes[num_idx++] = node;
>>> + dev_dbg(dev, " name: %s id: %d", ni_node_name[node.type], node.id);
>>> + }
>>> +
>>> + ni_pmu->dev = dev;
>>> + ni_pmu->ni = ni;
>>> + ni->ni_pmus[cd_idx] = ni_pmu;
>>> + }
>>> +
>>> + devm_kfree(dev, cd_arrays);
>>
>> Why? If it is not device-lifetime then allocate with usual way.
>>
>
> No device-lifetime.
> Will allocate in stack.
I was thinking about kzalloc. But if array is small, stack could be as well.
..
>>
>>> +
>>> +static const struct of_device_id ni_pmu_of_match[] = {
>>> + { .compatible = "hx,c2000-arm-ni" },
>>
>> Don't send undocumented compatibles.
>
> OK. Means I should send doc and code in one patch thread with more than
> one patch?
Yes. Please open lore.kernel.org and look at any other submissions
involving bindings or other type of ABI documentation (like sysfs).
Best regards,
Krzysztof
Powered by blists - more mailing lists