lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 23 May 2022 09:43:10 -0300
From:   Arnaldo Carvalho de Melo <acme@...nel.org>
To:     Jiri Olsa <olsajiri@...il.com>
Cc:     Leo Yan <leo.yan@...aro.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Like Xu <likexu@...cent.com>, Alyssa Ross <hi@...ssa.is>,
        Ian Rogers <irogers@...gle.com>,
        Kajol Jain <kjain@...ux.ibm.com>,
        Adam Li <adamli@...eremail.onmicrosoft.com>,
        Li Huafei <lihuafei1@...wei.com>,
        German Gomez <german.gomez@....com>,
        James Clark <james.clark@....com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Ali Saidi <alisaidi@...zon.com>, Joe Mario <jmario@...hat.com>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 00/11] perf c2c: Support display for Arm64

Em Mon, May 23, 2022 at 10:43:47AM +0200, Jiri Olsa escreveu:
> On Wed, May 18, 2022 at 01:57:18PM +0800, Leo Yan wrote:
> > Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
> > us to detect cache line contention and transfers.
> > 
> > Unlike x86 architecture, Arm SPE trace data cannot provide 'HITM'
> > snooping flag, Ali Said has a patch set v9 "perf: arm-spe: Decode SPE
> > source and use for perf c2c" [1] which introduces 'peer' flag and
> > synthesizes memory samples with this flag.
> > 
> > Based on patch set [1], this patch set is to finish the second half work
> > to consume the 'peer' flag in perf c2c tool, it adds an extra display
> > 'peer' mode.

Ok, I'll look at the base patch set...

> > Patches 01, 02 and 03 are to support 'N/A' metrics for store operations.
> > 
> > Patches 04 and 05 adds statistics and dimensions for memory samples with
> > peer flag.
> > 
> > Patches 06, 07, 08 are for refactoring, it refines the code with more
> > general naming so this can allow us to easier to extend display modes
> > but not strictly bound to HITM tags.
> > 
> > Patches 09, 10 and 11 are to extend display 'peer' mode, it also updates
> > the document and also changes to use 'peer' mode as default mode on
> > Arm64 arches.
> > 
> > This patch set has been verified for both x86 and Arm64 memory samples.
> > 
> > The display result with x86 memory samples:
> > 
> >   =================================================
> >              Shared Data Cache Line Table          
> >   =================================================
> >   #
> >   #        ----------- Cacheline ----------      Tot  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
> >   # Index             Address  Node  PA cnt     Hitm    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
> >   # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
> >   #
> >         0      0x55c8971f0080     0    1967   66.14%      252      252        0        0     6044     3550     2494     2024      470        0      528     2672       78        20      252         0        0         0         0
> >         1      0x55c8971f00c0     0       1   33.86%      129      129        0        0      914      914        0        0        0        0      272      374       52        87      129         0        0         0         0
> > 
> >   =================================================
> >         Shared Cache Line Distribution Pareto      
> >   =================================================
> >   #
> >   #        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                     Shared                               
> >   #   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol             Object              Source:Line  Node
> >   # .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  .................  .......................  ....
> >   #
> >     -------------------------------------------------------------------------------
> >         0        0      252        0     2024      470        0      0x55c8971f0080
> >     -------------------------------------------------------------------------------
> >              0.00%   12.30%    0.00%    0.00%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e9         0      1313       863         0     1222         3  [.] 0x00000000000013e9  false_sharing.exe  false_sharing.exe[13e9]   0
> >              0.00%    0.79%    0.00%   90.51%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e2         0      1800       878         0     3029         3  [.] 0x00000000000013e2  false_sharing.exe  false_sharing.exe[13e2]   0
> >              0.00%    0.00%    0.00%    9.49%  100.00%    0.00%                 0x0     0       1      0x55c8971ed3f4         0         0         0         0      662         3  [.] 0x00000000000013f4  false_sharing.exe  false_sharing.exe[13f4]   0
> >              0.00%   86.90%    0.00%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed447         0       141       103         0     1131         2  [.] 0x0000000000001447  false_sharing.exe  false_sharing.exe[1447]   0
> > 
> >     -------------------------------------------------------------------------------
> >         1        0      129        0        0        0        0      0x55c8971f00c0
> >     -------------------------------------------------------------------------------
> >              0.00%  100.00%    0.00%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed455         0        88        94         0      914         2  [.] 0x0000000000001455  false_sharing.exe  false_sharing.exe[1455]   0
> > 
> > 
> > The display result with Arm SPE memory samples:
> > 
> >   =================================================
> >              Shared Data Cache Line Table          
> >   =================================================
> >   #
> >   #        ----------- Cacheline ----------    Snoop  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
> >   # Index             Address  Node  PA cnt     Peer    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
> >   # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
> >   #
> >         0      0xaaaac17d6000   N/A       0  100.00%        0        0        0       99    18851    18851        0        0        0        0        0    18752        0        99        0         0        0         0         0
> > 
> >   =================================================
> >         Shared Cache Line Distribution Pareto      
> >   =================================================
> >   #
> >   #        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                    Shared                       
> >   #   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol            Object      Source:Line  Node
> >   # .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  ................  ...............  ....
> >   #
> >     -------------------------------------------------------------------------------
> >         0        0        0       99        0        0        0      0xaaaac17d6000
> >     -------------------------------------------------------------------------------
> >              0.00%    0.00%    6.06%    0.00%    0.00%    0.00%                0x20   N/A       0      0xaaaac17c25ac         0         0        43       375    18469         2  [.] 0x00000000000025ac  memstress         memstress[25ac]   0
> >              0.00%    0.00%   93.94%    0.00%    0.00%    0.00%                0x29   N/A       0      0xaaaac17c3e88         0         0       173       180      135         2  [.] 0x0000000000003e88  memstress         memstress[3e88]   0
> > 
> > [1] https://lore.kernel.org/lkml/20220517020326.18580-1-alisaidi@amazon.com/
> > 
> > Changes from v2:
> > * Updated patch 04 to account metrics for both cache level and ld_peer
> >   for PEER flag;
> > * Updated document for metric 'rmt_hit' which is accounted for all
> >   remote accesses (include remote DRAM and any upward caches).
> 
> LGTM
> 
> Acked-by: Jiri Olsa <jolsa@...nel.org>
> 
> thanks,
> jirka
> 
> > 
> > Changes from v1:
> > * Updated patches 01, 02 and 03 to support 'N/A' metrics for store
> >   operations, so can align with the patch set [1] for store samples.
> > 
> > 
> > Leo Yan (11):
> >   perf mem: Add stats for store operation with no available memory level
> >   perf c2c: Add dimensions for 'N/A' metrics of store operation
> >   perf c2c: Update documentation for store metric 'N/A'
> >   perf mem: Add statistics for peer snooping
> >   perf c2c: Add dimensions for peer load operations
> >   perf c2c: Use explicit names for display macros
> >   perf c2c: Rename dimension from 'percent_hitm' to
> >     'percent_costly_snoop'
> >   perf c2c: Refactor node header
> >   perf c2c: Sort on peer snooping for load operations
> >   perf c2c: Update documentation for new display option 'peer'
> >   perf c2c: Use 'peer' as default display for Arm64
> > 
> >  tools/perf/Documentation/perf-c2c.txt |  34 ++-
> >  tools/perf/builtin-c2c.c              | 357 ++++++++++++++++++++------
> >  tools/perf/util/mem-events.c          |  25 +-
> >  tools/perf/util/mem-events.h          |   2 +
> >  4 files changed, 331 insertions(+), 87 deletions(-)
> > 
> > -- 
> > 2.25.1
> > 

-- 

- Arnaldo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ