linux-kernel - Re: [PATCH] perf stat: Support per-cluster aggregation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <150cb3ae-fbb3-e0e1-57af-6f5b28222fdb@huawei.com>
Date:   Mon, 27 Mar 2023 12:03:56 +0800
From:   Yicong Yang <yangyicong@...wei.com>
To:     "Chen, Tim C" <tim.c.chen@...el.com>,
        "acme@...nel.org" <acme@...nel.org>,
        "mark.rutland@....com" <mark.rutland@....com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "james.clark@....com" <james.clark@....com>,
        "alexander.shishkin@...ux.intel.com" 
        <alexander.shishkin@...ux.intel.com>,
        "linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:     <yangyicong@...ilicon.com>,
        "Jonathan.Cameron@...wei.com" <Jonathan.Cameron@...wei.com>,
        "21cnbao@...il.com" <21cnbao@...il.com>,
        "prime.zeng@...ilicon.com" <prime.zeng@...ilicon.com>,
        "shenyang39@...wei.com" <shenyang39@...wei.com>,
        "linuxarm@...wei.com" <linuxarm@...wei.com>
Subject: Re: [PATCH] perf stat: Support per-cluster aggregation

Hi Tim,

On 2023/3/25 2:05, Chen, Tim C wrote:
>>
>> From: Yicong Yang <yangyicong@...ilicon.com>
>>
>> Some platforms have 'cluster' topology and CPUs in the cluster will share
>> resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2 cache (for Intel
>> Jacobsville). Currently parsing and building cluster topology have been
>> supported since [1].
>>
>> perf stat has already supported aggregation for other topologies like die or
>> socket, etc. It'll be useful to aggregate per-cluster to find problems like L3T
>> bandwidth contention or imbalance.
>>
>> This patch adds support for "--per-cluster" option for per-cluster aggregation.
>> Also update the docs and related test. The output will be like:
>>
>> [root@...alhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
>>
>> Performance counter stats for 'system wide':
>>
>> S56-D0-CLS158    4      1,321,521,570      LLC-load
>> S56-D0-CLS594    4        794,211,453      LLC-load
>> S56-D0-CLS1030    4             41,623      LLC-load
>> S56-D0-CLS1466    4             41,646      LLC-load
>> S56-D0-CLS1902    4             16,863      LLC-load
>> S56-D0-CLS2338    4             15,721      LLC-load
>> S56-D0-CLS2774    4             22,671      LLC-load
>> [...]
> 
> Overall it looks good.  You can add my reviewed-by.
> 

thanks.

> I wonder if we could enhance the help message 
> in perf stat to tell user to refer to 
> /sys/devices/system/cpu/cpuX/topology/*_id
> to map relevant ids back to overall cpu topology.
> 
> For example the above example, cluster S56-D0-CLS158  has
> really heavy load. It took me  a while
> going through the code to figure out how to find
> the info that maps cluster id to cpu.
> 

yes, indeed. Actually this is because my bios doesn't report a valid
ID for these topologies so the ACPI use the offset of the topology
node in the PPTT as a fallback. Other topologies also suffers the same:

On my machine:
[root@...alhost debug]# perf stat --per-socket -e cycles -a -- sleep 1

 Performance counter stats for 'system wide':

S56      64         21,563,375      cycles
S7182    64         32,140,641      cycles

       1.008520310 seconds time elapsed

On x86:
root@...ntu204:/home/yang/linux/tools/perf# ./perf stat -a --per-socket -e cycles -- sleep 1

 Performance counter stats for 'system wide':

S0       40        137,205,897      cycles
S1       40         67,720,731      cycles

       1.003546720 seconds time elapsed

Maybe I can try to add a separate patch for describing the source of the
topology ids in the perf manual.

Thanks,
Yicong