lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM9d7cjuvB_67zSmaLAJJ-zS3RL5F59k8p+oqsuzJEOkAUx=WQ@mail.gmail.com>
Date:   Tue, 20 Oct 2020 17:13:01 +0900
From:   Namhyung Kim <namhyung@...nel.org>
To:     Leo Yan <leo.yan@...aro.org>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Ian Rogers <irogers@...gle.com>, Joe Mario <jmario@...hat.com>,
        David Ahern <dsahern@...il.com>,
        Don Zickus <dzickus@...hat.com>, Al Grant <Al.Grant@....com>,
        James Clark <james.clark@....com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load

Hello,

On Thu, Oct 15, 2020 at 11:51 PM Leo Yan <leo.yan@...aro.org> wrote:
>
> If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
> rely on HITM display to report cache false sharing.  Alternatively, we
> can use the LLC access and multi-threads info to locate the potential
> false sharing's data address, and if we connect with source code and
> analyze the multi-threads' execution timing, if can conclude load and
> store the same cache line at the meantime, thus this can be helpful for
> resolve the cache false sharing issue.
>
> This patch set is to enable the display with sorting on LLC load
> accesses; it adds dimensions for total LLC hit and LLC load accesses,
> and these dimensions are used for shared cache line table and pareto.
>
> This patch set is dependend on the patch set "perf c2c: Refine the
> organization of metrics" [1].
>
> [1] https://lore.kernel.org/patchwork/cover/1321499/
>
> With this patch set, we can get display 'llc' as follows:
>
>   # perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio

I'm not sure if you ran the test on x86 or ARM.
IIUC ARM should have 0 local hitm, right?

Thanks
Namhyung

>
>   [...]
>
>   =================================================
>              Shared Data Cache Line Table
>   =================================================
>   #
>   #        ----------- Cacheline ----------  LLC Hit   LLC Hit    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
>   # Index             Address  Node  PA cnt      Pct     Total  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
>   # .....  ..................  ....  ......  .......  ........  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
>   #
>         0      0x563b01e83100     0    1401   65.32%       648     7011     3738     3273     2582      691      515     2516       59       143      505         0        0         0         0
>         1      0x563b01e830c0     0       1   26.51%       263      400      400        0        0        0      130        3        4       262        1         0        0         0         0
>         2      0x563b01e83080     0       1    7.76%        77      650      650        0        0        0      180      348       45        14       63         0        0         0         0
>         3  0xffff88c3d74e82c0     0       1    0.10%         1        1        1        0        0        0        0        0        0         1        0         0        0         0         0
>         4  0xffffa587c11e38c0   N/A       0    0.10%         1        2        1        1        1        0        0        0        0         1        0         0        0         0         0
>         5  0xffffffffbd5e6fc0     0       1    0.10%         1        1        1        0        0        0        0        0        0         0        1         0        0         0         0
>         6      0x7f90a4d6c2c0     0       1    0.10%         1        1        1        0        0        0        0        0        0         1        0         0        0         0         0
>
>   =================================================
>         Shared Cache Line Distribution Pareto
>   =================================================
>   #
>   #        ---- LLC LD ----  -- Store Refs --  --------- Data address ---------                                                   ---------- cycles ----------    Total       cpu                                  Shared
>   #   Num   LclHit  LclHitm   L1 Hit  L1 Miss              Offset  Node  PA cnt      Pid                 Tid        Code address  rmt hitm  lcl hitm      load  records       cnt               Symbol             Object                  Source:Line  Node
>   # .....  .......  .......  .......  .......  ..................  ....  ......  .......  ..................  ..................  ........  ........  ........  .......  ........  ...................  .................  ...........................  ....
>   #
>     -------------------------------------------------------------
>         0      143      505     2582      691      0x563b01e83100
>     -------------------------------------------------------------
>             96.50%    7.72%   46.79%    0.00%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c16         0      1949      1331     1876         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
>              0.00%   35.05%    0.00%    0.00%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c1d         0      2651       975      748         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              0.00%   30.89%    0.00%    0.00%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c1d         0      1425      1003      762         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              2.10%    7.52%   49.19%    0.00%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c16         0      1585      1053     2037         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
>              0.00%    0.00%    2.52%   44.86%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c28         0         0         0      375         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              0.00%    0.00%    1.51%   55.14%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c28         0         0         0      420         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              1.40%   12.87%    0.00%    0.00%                0x20     0       1    14100    14104:reader_thd      0x563b01c81c73         0       166        99      417         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
>              0.00%    5.94%    0.00%    0.00%                0x20     0       1    14100    14105:reader_thd      0x563b01c81c73         0       144        85      376         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
>
>   [...]
>
>
> Leo Yan (8):
>   perf mem: Add structure field c2c_stats::tot_llchit
>   perf c2c: Add dimensions for total LLC hit
>   perf c2c: Add dimensions for LLC load hit
>   perf c2c: Change to general naming for macros
>   perf c2c: Rename for shared cache line stats
>   perf c2c: Refactor hist entry validation
>   perf c2c: Add option '-d llc' for sorting with LLC load
>   perf c2c: Update documentation for display option 'llc'
>
>  tools/perf/Documentation/perf-c2c.txt |  18 +-
>  tools/perf/builtin-c2c.c              | 333 +++++++++++++++++++++-----
>  tools/perf/util/mem-events.c          |   3 +
>  tools/perf/util/mem-events.h          |   1 +
>  4 files changed, 286 insertions(+), 69 deletions(-)
>
> --
> 2.17.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ