[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6cd44ff7-d339-d9a4-a134-2b8b9b3dbbfa@huawei.com>
Date: Wed, 29 Mar 2023 20:46:55 +0800
From: Yicong Yang <yangyicong@...wei.com>
To: Namhyung Kim <namhyung@...il.com>,
"Chen, Tim C" <tim.c.chen@...el.com>
CC: <yangyicong@...ilicon.com>, "acme@...nel.org" <acme@...nel.org>,
"mark.rutland@....com" <mark.rutland@....com>,
"peterz@...radead.org" <peterz@...radead.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"james.clark@....com" <james.clark@....com>,
"alexander.shishkin@...ux.intel.com"
<alexander.shishkin@...ux.intel.com>,
"linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Jonathan.Cameron@...wei.com" <Jonathan.Cameron@...wei.com>,
"21cnbao@...il.com" <21cnbao@...il.com>,
"prime.zeng@...ilicon.com" <prime.zeng@...ilicon.com>,
"shenyang39@...wei.com" <shenyang39@...wei.com>,
"linuxarm@...wei.com" <linuxarm@...wei.com>
Subject: Re: [PATCH] perf stat: Support per-cluster aggregation
On 2023/3/29 14:47, Namhyung Kim wrote:
> Hello,
>
> On Fri, Mar 24, 2023 at 11:09 AM Chen, Tim C <tim.c.chen@...el.com> wrote:
>>
>>>
>>> From: Yicong Yang <yangyicong@...ilicon.com>
>>>
>>> Some platforms have 'cluster' topology and CPUs in the cluster will share
>>> resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2 cache (for Intel
>>> Jacobsville). Currently parsing and building cluster topology have been
>>> supported since [1].
>>>
>>> perf stat has already supported aggregation for other topologies like die or
>>> socket, etc. It'll be useful to aggregate per-cluster to find problems like L3T
>>> bandwidth contention or imbalance.
>>>
>>> This patch adds support for "--per-cluster" option for per-cluster aggregation.
>>> Also update the docs and related test. The output will be like:
>>>
>>> [root@...alhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
>>>
>>> Performance counter stats for 'system wide':
>>>
>>> S56-D0-CLS158 4 1,321,521,570 LLC-load
>>> S56-D0-CLS594 4 794,211,453 LLC-load
>>> S56-D0-CLS1030 4 41,623 LLC-load
>>> S56-D0-CLS1466 4 41,646 LLC-load
>>> S56-D0-CLS1902 4 16,863 LLC-load
>>> S56-D0-CLS2338 4 15,721 LLC-load
>>> S56-D0-CLS2774 4 22,671 LLC-load
>>> [...]
>>
>> Overall it looks good. You can add my reviewed-by.
>>
>> I wonder if we could enhance the help message
>> in perf stat to tell user to refer to
>> /sys/devices/system/cpu/cpuX/topology/*_id
>> to map relevant ids back to overall cpu topology.
>>
>> For example the above example, cluster S56-D0-CLS158 has
>> really heavy load. It took me a while
>> going through the code to figure out how to find
>> the info that maps cluster id to cpu.
>
> Maybe we could enhance the cpu filter to accept something
> like -C S56-D0-CLS158.
>
you mean specified the CPUs by a topology ID like this S56-D0-CLS158
then we actually filtering the CPUs in the CLS 158?
> I also wonder what if it runs on an old kernel which doesn't
> have the cluster_id file.
It should work well but may not be proper for the cluster. There's
no die topology nor related sysfs attributes on arm64, but --per-die
works like:
[root@...alhost perf]# perf stat -a -e cycles --per-die -- sleep 1
Performance counter stats for 'system wide':
S56-D0 64 12,700,186 cycles
S7182-D0 64 20,297,320 cycles
1.003638080 seconds time elapsed
On a legacy kernel without cluster sysfs attributes, the output will be
look like:
[root@...alhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1
Performance counter stats for 'system wide':
S56-D0-CLS-1 64 12,634,251 cycles
S7182-D0-CLS-1 64 16,348,322 cycles
1.003696680 seconds time elapsed
The patch just assign -1 to the cluster id. I'll modify this to keep consistence
with the output of --per-die. Thanks for catching this!
Thanks,
Yicong
Powered by blists - more mailing lists