lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d15bb8a2-2203-cabf-69de-627d0a45efdc@arm.com>
Date:   Thu, 19 May 2022 15:19:09 +0100
From:   James Clark <james.clark@....com>
To:     Leo Yan <leo.yan@...aro.org>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Like Xu <likexu@...cent.com>, Ian Rogers <irogers@...gle.com>,
        Alyssa Ross <hi@...ssa.is>, Kajol Jain <kjain@...ux.ibm.com>,
        Li Huafei <lihuafei1@...wei.com>,
        German Gomez <german.gomez@....com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Ali Saidi <alisaidi@...zon.com>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 00/11] perf c2c: Support display for Arm64



On 08/05/2022 10:23, Leo Yan wrote:
> Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
> us to detect cache line contention and transfers.
> 
> Unlike x86 architecture, Arm SPE trace data cannot provide 'HITM'
> snooping flag, Ali Said has a patch set "perf: arm-spe: Decode SPE
> source and use for perf c2c" [1] which introduces 'peer' flag and
> synthesizes memory samples with this flag.
> 
> Based on patch set [1], this patch set is to finish the second half work
> to consume the 'peer' flag in perf c2c tool, it adds an extra display
> 'peer' mode.
> 
> Patches 01, 02 and 03 are to support 'N/A' metrics for store operations.
> 
> Patches 04 and 05 adds statistics and dimensions for memory samples with
> peer flag.
> 
> Patches 06, 07, 08 are for refactoring, it refines the code with more
> general naming so this can allow us to easier to extend display modes
> but not strictly bound to HITM tags.
> 
> Patches 09, 10 and 11 are to extend display 'peer' mode, it also updates
> the document and also changes to use 'peer' mode as default mode on
> Arm64 arches.
> 
> This patch set has been verified for both x86 and Arm64 memory samples.
> 
> The display result with x86 memory samples:
> 
>   =================================================
>              Shared Data Cache Line Table          
>   =================================================
>   #
>   #        ----------- Cacheline ----------      Tot  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
>   # Index             Address  Node  PA cnt     Hitm    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
>   # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
>   #
>         0      0x55c8971f0080     0    1967   66.14%      252      252        0        0     6044     3550     2494     2024      470        0      528     2672       78        20      252         0        0         0         0
>         1      0x55c8971f00c0     0       1   33.86%      129      129        0        0      914      914        0        0        0        0      272      374       52        87      129         0        0         0         0
> 
>   =================================================
>         Shared Cache Line Distribution Pareto      
>   =================================================
>   #
>   #        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                     Shared                               
>   #   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol             Object              Source:Line  Node
>   # .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  .................  .......................  ....
>   #
>     -------------------------------------------------------------------------------
>         0        0      252        0     2024      470        0      0x55c8971f0080
>     -------------------------------------------------------------------------------
>              0.00%   12.30%    0.00%    0.00%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e9         0      1313       863         0     1222         3  [.] 0x00000000000013e9  false_sharing.exe  false_sharing.exe[13e9]   0
>              0.00%    0.79%    0.00%   90.51%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e2         0      1800       878         0     3029         3  [.] 0x00000000000013e2  false_sharing.exe  false_sharing.exe[13e2]   0
>              0.00%    0.00%    0.00%    9.49%  100.00%    0.00%                 0x0     0       1      0x55c8971ed3f4         0         0         0         0      662         3  [.] 0x00000000000013f4  false_sharing.exe  false_sharing.exe[13f4]   0
>              0.00%   86.90%    0.00%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed447         0       141       103         0     1131         2  [.] 0x0000000000001447  false_sharing.exe  false_sharing.exe[1447]   0
> 
>     -------------------------------------------------------------------------------
>         1        0      129        0        0        0        0      0x55c8971f00c0
>     -------------------------------------------------------------------------------
>              0.00%  100.00%    0.00%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed455         0        88        94         0      914         2  [.] 0x0000000000001455  false_sharing.exe  false_sharing.exe[1455]   0
> 
> 
> The display result with Arm SPE memory samples:
> 
>   =================================================
>              Shared Data Cache Line Table          
>   =================================================
>   #
>   #        ----------- Cacheline ----------    Snoop  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
>   # Index             Address  Node  PA cnt     Peer    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
>   # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
>   #
>         0      0xaaaac17d6000   N/A       0  100.00%        0        0        0       99    18851    18851        0        0        0        0        0    18752        0         0        0         0        0         0         0
> 
>   =================================================
>         Shared Cache Line Distribution Pareto      
>   =================================================
>   #
>   #        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                    Shared                       
>   #   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol            Object      Source:Line  Node
>   # .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  ................  ...............  ....
>   #
>     -------------------------------------------------------------------------------
>         0        0        0       99        0        0        0      0xaaaac17d6000
>     -------------------------------------------------------------------------------
>              0.00%    0.00%    6.06%    0.00%    0.00%    0.00%                0x20   N/A       0      0xaaaac17c25ac         0         0        43       375    18469         2  [.] 0x00000000000025ac  memstress         memstress[25ac]   0
>              0.00%    0.00%   93.94%    0.00%    0.00%    0.00%                0x29   N/A       0      0xaaaac17c3e88         0         0       173       180      135         2  [.] 0x0000000000003e88  memstress         memstress[3e88]   0
> 

Hi Leo,

I see that the HITM and Store Refs columns (even N/A) are all 0% on Arm.
Assuming it's a similar example to the x86 one above, does that mean that
these are always expected to be 0? If so, should they be removed to avoid
confusion, or is it just a case of the example not producing any non 0
data for these columns?

Thanks
James

> [1] https://lore.kernel.org/lkml/20220504184850.24986-1-alisaidi@amazon.com/
> 
> Changes from v1:
> * Update patches 01, 02 and 03 to support 'N/A' metrics for store
>   operations, so can align with the patch set [1] for store samples.
> 
> 
> Leo Yan (11):
>   perf mem: Add stats for store operation with no available memory level
>   perf c2c: Add dimensions for 'N/A' metrics of store operation
>   perf c2c: Update documentation for store metric 'N/A'
>   perf mem: Add statistics for peer snooping
>   perf c2c: Add dimensions for peer load operations
>   perf c2c: Use explicit names for display macros
>   perf c2c: Rename dimension from 'percent_hitm' to
>     'percent_costly_snoop'
>   perf c2c: Refactor node header
>   perf c2c: Sort on peer snooping for load operations
>   perf c2c: Update documentation for new display option 'peer'
>   perf c2c: Use 'peer' as default display for Arm64
> 
>  tools/perf/Documentation/perf-c2c.txt |  30 ++-
>  tools/perf/builtin-c2c.c              | 363 ++++++++++++++++++++------
>  tools/perf/util/mem-events.c          |  14 +-
>  tools/perf/util/mem-events.h          |   2 +
>  4 files changed, 323 insertions(+), 86 deletions(-)
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ