lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 1 Jun 2022 18:25:05 +0800
From:   Leo Yan <leo.yan@...aro.org>
To:     Joe Mario <jmario@...hat.com>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>, Alyssa Ross <hi@...ssa.is>,
        Ian Rogers <irogers@...gle.com>, Like Xu <likexu@...cent.com>,
        Kajol Jain <kjain@...ux.ibm.com>,
        Li Huafei <lihuafei1@...wei.com>,
        Adam Li <adam.li@...erecomputing.com>,
        German Gomez <german.gomez@....com>,
        James Clark <james.clark@....com>,
        Ali Saidi <alisaidi@...zon.com>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 00/12] perf c2c: Support display for Arm64

Hi Joe,

On Tue, May 31, 2022 at 02:44:07PM -0400, Joe Mario wrote:

[...]

> Hi Leo:
> I built a new perf with your patches and ran it on a 2-numa node Neoverse platform.
> I then ran my simple test that creates reader and writer threads to tug on the same cacheline.
> The c2c output is appended below.
>
> The output looks good, especially where you've broken out the (average) cycles for local and remote peer loads.  
> And I'm glad to see you fixed the "Node" column.  I use that a lot to help detect remote node accesses.  

Thanks a lot for your testing and suggestions, which are really helpful!

> And the "PA cnt" field is working as well,  which is important to see if numa_balance is moving the data around.

Good to know this.  To be honest, before I didn't note for "PA cnt"
metric, I checked a bit for the code, this metrics is very useful to
understand how it's severe that a cache line is accessed from different
addresses, so we can get sense how a cache line is hammered.

> =================================================
>            Shared Data Cache Line Table
> =================================================
> #
> #        ----------- Cacheline ----------     Peer  ------- Load Peer -------    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
> # Index             Address  Node  PA cnt    Snoop    Total    Local   Remote  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
> # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
> #
>       0            0x422140     0    6904   74.86%      137      131        6   148008   144970     3038        0        0     3038        0   144833      120        11        0         6        0         0         0
>       1  0xffffd976e63ae5c0     1       6    3.83%        7        7        0       15       15        0        0        0        0        0        8        4         3        0         0        0         0         0
>       2  0xffff07ffbf290980     0       5    2.19%        4        2        2       14       14        0        0        0        0        0       10        1         1        0         2        0         0         0
>       3  0xffffd976e57275c0     1       1    0.55%        1        1        0        1        1        0        0        0        0        0        0        1         0        0         0        0         0         0
>       4  0xffffd976e6071c00     1       3    0.55%        1        0        1        4        4        0        0        0        0        0        3        0         0        0         1        0         0         0
>      [snip]
> =================================================
>       Shared Cache Line Distribution Pareto
> =================================================
> #
> #        -- Peer Snoop --  ------- Store Refs ------  --------- Data address ---------                      ---------- cycles ----------    Total       cpu                               Shared
> #   Num      Rmt      Lcl   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt peer  lcl peer      load  records       cnt                      Symbol   Object                Source:Line  Node
> # .....  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  .......  ........  ..........................  .......  .........................  ....
> #
>   ----------------------------------------------------------------------
>       0        6      131        0        0     3038            0x422140
>   ----------------------------------------------------------------------
>            0.00%    0.00%    0.00%    0.00%   52.60%                 0x8     0       1            0x400e6c         0         0         0     1598         4  [.] writer                  tugtest  tugtest.c:152               0 1
>            0.00%    0.00%    0.00%    0.00%   47.40%                0x10     0       1            0x400e7c         0         0         0     1440         4  [.] writer                  tugtest  tugtest.c:153               0 1
>           33.33%   75.57%    0.00%    0.00%    0.00%                0x20     0       1            0x401018      4095      3803      3419      409         4  [.] reader                  tugtest  tugtest.c:187               0 1
>           66.67%   24.43%    0.00%    0.00%    0.00%                0x28     0       1            0x401034      4095      3470      3643      413         4  [.] reader                  tugtest  tugtest.c:187               0 1
> 
>   ----------------------------------------------------------------------
>       1        0        7        0        0        0  0xffffd976e63ae5c0
>   ----------------------------------------------------------------------
>            0.00%   57.14%    0.00%    0.00%    0.00%                 0x0     1       1  0xffffd976e4815fbc         0      1333         0        4         2  [k] ktime_get                   [kernel.kallsyms]  seqlock.h:276          1                   
>            0.00%   14.29%    0.00%    0.00%    0.00%                 0x0     1       1  0xffffd976e4816d10         0       266       794        4         3  [k] ktime_get_update_offsets_n  [kernel.kallsyms]  seqlock.h:276        0 1
>            0.00%   28.57%    0.00%    0.00%    0.00%                0x30     1       1  0xffffd976e4816d20         0        87       150        4         3  [k] ktime_get_update_offsets_n  [kernel.kallsyms]  timekeeping.c:2298   0 1
>   
>   ----------------------------------------------------------------------     
>       2        2        2        0        0        0  0xffff07ffbf290980
>   ----------------------------------------------------------------------
>           50.00%  100.00%    0.00%    0.00%    0.00%                 0x4     0       1  0xffffd976e47d2bdc      1217      1600      1147        4         3  [k] queued_spin_lock_slowpath  [kernel.kallsyms]  qspinlock.c:511    0 1
>           50.00%    0.00%    0.00%    0.00%    0.00%                 0x4     0       1  0xffffd976e47d2a2c      4033         0         0        1         1  [k] queued_spin_lock_slowpath  [kernel.kallsyms]  qspinlock.c:382    0 1
>   
>   ----------------------------------------------------------------------     
> 
> Thanks for doing this.  It looks good.

You are welcome!  And very appreicate your helping to mature the code.

> I'll assume someone else is reviewing your code changes.

Yeah, let's give a bit more time for reviewing.

Thanks,
Leo

Powered by blists - more mailing lists