[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fefc1f23-ea1c-6339-77c4-b0974cbd6e93@amperemail.onmicrosoft.com>
Date: Fri, 13 May 2022 17:05:45 +0800
From: Adam Li <adamli@...eremail.onmicrosoft.com>
To: Leo Yan <leo.yan@...aro.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Like Xu <likexu@...cent.com>, Ian Rogers <irogers@...gle.com>,
Alyssa Ross <hi@...ssa.is>, Kajol Jain <kjain@...ux.ibm.com>,
Li Huafei <lihuafei1@...wei.com>,
German Gomez <german.gomez@....com>,
James Clark <james.clark@....com>,
Kan Liang <kan.liang@...ux.intel.com>,
Ali Saidi <alisaidi@...zon.com>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 9/11] perf c2c: Sort on peer snooping for load
operations
On 5/8/2022 5:23 PM, Leo Yan wrote:
> Except the existed three display options 'tot', 'rmt', 'lcl', this patch
> adds a new option 'peer' so can sort on the cache hit for peer snooping.
>
> For displaying with option 'peer', the "Shared Data Cache Line Table" and
> "Shared Cache Line Distribution Pareto" both sort with the metrics
> "ld_peer". As result, we can get the 'peer' display as below:
>
> # perf c2c report -d peer --coalesce tid,pid,iaddr,dso -N --stdio
>
Hi Leo,
I tested v2 patch on 2P Altra system.
In case the false-sharing data is mainly from remote node, 'Snoop Peers'
cannot indicate severity of false-sharing. As showed in bellow output,
there are only 10 'Load HIT Peer' records, while there are 2353
'Load Remote DRAM' records.
And the name 'Load Remote DRAM' is kind of misleading, since we cannot tell
the data source is 'DRAM'.
Run false_sharing test(https://github.com/joemario/perf-c2c-usage-files):
one lock_th on node 0, one reader_thd on node 1:
# perf c2c record -- numactl -m 0 ./false_sharing.exe 1
131 mticks, reader_thd (thread 1), on node 1 (cpu 80).
145 mticks, lock_th (thread 0), on node 0 (cpu 9).
[ perf record: Woken up 16 times to write data ]
[ perf record: Captured and wrote 33.726 MB perf.data ]
# perf c2c report -d peer --coalesce tid,pid,iaddr,dso -N --stdio
Warning:
Arm SPE CONTEXT packets not found in the traces.
Matching of TIDs to SPE events could be inaccurate.
Warning:
AUX data detected collision 6 times out of 47!
=================================================
Trace Event Information
=================================================
Total records : 551944
Locked Load/Store Operations : 0
Load Operations : 493082
Loads - uncacheable : 0
Loads - IO : 0
Loads - Miss : 0
Loads - no mapping : 0
Load Fill Buffer Hit : 0
Load L1D hit : 490589
Load L2D hit : 117
Load LLC hit : 11
Load HIT Peer : 10
Load Local HITM : 0
Load Remote HITM : 0
Load Remote HIT : 0
Load Local DRAM : 2
Load Remote DRAM : 2353
Load MESI State Exclusive : 2355
Load MESI State Shared : 0
Load LLC Misses : 2355
Load access blocked by data : 0
Load access blocked by address : 0
LLC Misses to Local DRAM : 0.1%
LLC Misses to Remote DRAM : 99.9%
LLC Misses to Remote cache (HIT) : 0.0%
LLC Misses to Remote cache (HITM) : 0.0%
Store Operations : 58862
Store - uncacheable : 0
Store - no mapping : 0
Store L1D Hit : 0
Store L1D Miss : 0
Store No available memory level : 58862
No Page Map Rejects : 490
Unable to parse data source : 0
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 9
Load HITs on shared lines : 21
Fill Buffer Hits on shared lines : 0
L1D hits on shared lines : 6
L2D hits on shared lines : 1
Load HITs on peer cache lines : 10
LLC hits on shared lines : 0
Locked Access on shared lines : 0
Blocked Access on shared lines : 0
Store HITs on shared lines : 0
Store L1D hits on shared lines : 0
Store No available memory level : 0
Total Merged records : 0
=================================================
c2c details
=================================================
Events : arm_spe_0/ts_enable=1,load_filter=1,store_filter=1,min_latency=30/
: dummy:u
: memory
Cachelines sort on : Snoop Peers
Cacheline data grouping : offset,tid,pid,iaddr,dso
[...]
Thanks,
-adam
Powered by blists - more mailing lists