[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e87bc3d7-6664-a1b7-faee-6117aa1d121c@os.amperecomputing.com>
Date: Thu, 19 May 2022 17:06:18 +0800
From: Adam Li <adamli@...amperecomputing.com>
To: Leo Yan <leo.yan@...aro.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Like Xu <likexu@...cent.com>, Ian Rogers <irogers@...gle.com>,
Alyssa Ross <hi@...ssa.is>, Kajol Jain <kjain@...ux.ibm.com>,
Li Huafei <lihuafei1@...wei.com>,
German Gomez <german.gomez@....com>,
James Clark <james.clark@....com>,
Kan Liang <kan.liang@...ux.intel.com>,
Ali Saidi <alisaidi@...zon.com>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 9/11] perf c2c: Sort on peer snooping for load
operations
Hi Leo,
Thanks for the update.
On 5/18/2022 2:12 PM, Leo Yan wrote:
> Please note, in the total statistics, all remote accesses will be
> accounted into metric "rmt_hit", so "rmt_hit" includes the access for
> remote DRAM or any upwards cache levels due we cannot distinguish
> them.
>
Agree that "Load Remote HIT" makes more sense than "Load Remote DRAM".
> From my experiment, with this updating the output result is promised
> for the peer accesses and it's easier for inspecting false sharing.
>
> As you might see I have prepared a git repo:
> https://git.linaro.org/people/leo.yan/linux-spe.git/ branch:
> perf_c2c_arm_spe_peer_v3, which contains the updated patches for both
> memory flag setting and perf c2c related patches.
>
> Could you confirm if the updated code works for you or not?
>
I tested v3 patch (perf_c2c_arm_spe_peer_v3 branch) on 2P Altra system.
Compared with v2, "Snoop Peer" can better indicate cache false-sharing,
for the 'false_sharing.exe' test case.
Bellow are details:
# perf c2c record -- numactl -m 0 ./false_sharing.exe 2
183 mticks, reader_thd (thread 2), on node 0 (cpu 78).
195 mticks, reader_thd (thread 3), on node 1 (cpu 124).
546 mticks, lock_th (thread 0), on node 0 (cpu 0).
562 mticks, lock_th (thread 1), on node 1 (cpu 123).
[ perf record: Woken up 36 times to write data ]
[ perf record: Captured and wrote 72.440 MB perf.data ]
# perf c2c report -d peer --coalesce tid,pid,iaddr,dso -N --stdio
Warning:
Arm SPE CONTEXT packets not found in the traces.
Matching of TIDs to SPE events could be inaccurate.
Warning:
AUX data detected collision 20 times out of 168!
=================================================
Total records : 1198728
Locked Load/Store Operations : 0
Load Operations : 1031196
Loads - uncacheable : 0
Loads - IO : 0
Loads - Miss : 0
Loads - no mapping : 0
Load Fill Buffer Hit : 0
Load L1D hit : 970636
Load L2D hit : 292
Load LLC hit : 2477
Load Local HITM : 0
Load Remote HITM : 0
Load Remote HIT : 56459
Load Local DRAM : 1332
Load Remote DRAM : 0
Load MESI State Exclusive : 1332
Load MESI State Shared : 0
Load LLC Misses : 57791
Load access blocked by data : 0
Load access blocked by address : 0
Load HIT Peer : 58814
LLC Misses to Local DRAM : 2.3%
LLC Misses to Remote DRAM : 0.0%
LLC Misses to Remote cache (HIT) : 97.7%
LLC Misses to Remote cache (HITM) : 0.0%
Store Operations : 167532
Store - uncacheable : 0
Store - no mapping : 0
Store L1D Hit : 0
Store L1D Miss : 0
Store No available memory level : 167532
No Page Map Rejects : 1234
Unable to parse data source : 0
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 45
Load HITs on shared lines : 226254
Fill Buffer Hits on shared lines : 0
L1D hits on shared lines : 166010
L2D hits on shared lines : 4
Load HITs on peer cache lines : 58814
LLC hits on shared lines : 2455
Locked Access on shared lines : 0
Blocked Access on shared lines : 0
Store HITs on shared lines : 96403
Store L1D hits on shared lines : 0
Store No available memory level : 96403
Total Merged records : 96403
=================================================
c2c details
=================================================
Events : arm_spe_0/ts_enable=1,load_filter=1,store_filter=1,min_latency=30/
: dummy:u
: memory
Cachelines sort on : Snoop Peers
Cacheline data grouping : offset,tid,pid,iaddr,dso
=================================================
Shared Data Cache Line Table
=================================================
#
# ----------- Cacheline ---------- Snoop ------- Load Hitm ------- Snoop Total Total Total --------- Stores -------- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
# Index Address Node PA cnt Peer Total LclHitm RmtHitm Peer records Loads Stores L1Hit L1Miss N/A FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
# ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
#
0 0x420180 N/A 0 95.53% 0 0 0 56183 246056 219522 26534 0 0 26534 0 161914 0 106 0 56176 0 1326 0
1 0x420100 N/A 0 4.37% 0 0 0 2571 76437 6576 69861 0 0 69861 0 4005 0 2335 0 236 0 0 0
[...]
Thanks,
-adam
Powered by blists - more mailing lists