lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 22 Oct 2016 10:28:06 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Arnaldo Carvalho de Melo <acme@...nel.org>
Cc:     linux-kernel@...r.kernel.org, Linux Weekly News <lwn@....net>,
        Andi Kleen <andi@...stfloor.org>,
        David Ahern <dsahern@...il.com>,
        Don Zickus <dzickus@...hat.com>, Jiri Olsa <jolsa@...nel.org>,
        Joe Mario <jmario@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Arnaldo Carvalho de Melo <acme@...hat.com>
Subject: Re: [GIT PULL 00/52] New Tool: perf c2c


* Arnaldo Carvalho de Melo <acme@...nel.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling into tip/perf/core,
> 
> Thanks,
> 
> - Arnaldo
> 
> The following changes since commit 10b37cb59fa1e61fec1386f324615e0e8202cd87:
> 
>   Merge tag 'perf-vendor_events-for-mingo-20161018' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2016-10-19 15:22:26 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-c2c-for-mingo-20161020
> 
> for you to fetch changes up to 535bbde62701b2bb298063e9dfa007e8a1ff95d1:
> 
>   perf c2c report: Add --show-all option (2016-10-19 13:18:31 -0300)
> 
> ----------------------------------------------------------------
> - The 'perf c2c' tool provides means for Shared Data C2C/HITM analysis.
> 
>   It allows you to track down cacheline contention. The tool is based
>   on x86's load latency and precise store facility events provided by
>   Intel CPUs.
> 
>   It was tested by Joe Mario and has proven to be useful, finding some
>   cacheline contentions. Joe also wrote a blog about c2c tool with
>   examples:
> 
>     https://joemario.github.io/blog/2016/09/01/c2c-blog/
> 
>   Excerpt of the content on this site:
> 
>   ---
>     At a high level, “perf c2c” will show you:
> 
>     * The cachelines where false sharing was detected.
>     * The readers and writers to those cachelines, and the offsets where those accesses occurred.
>     * The pid, tid, instruction addr, function name, binary object name for those readers and writers.
>     * The source file and line number for each reader and writer.
>     * The average load latency for the loads to those cachelines.
>     * Which numa nodes the samples a cacheline came from and which CPUs were involved.
> 
>     Using perf c2c is similar to using the Linux perf tool today.
>     First collect data with “perf c2c record” Then generate a report output with “perf c2c report”
>   ---
> 
>   There one finds extensive details on using the tool, with tips on
>   reducing the volume of samples while still capturing enough to do
>   its job. (Dick Fowles, Joe Mario, Don Zickus, Jiri Olsa)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@...hat.com>
> 
> ----------------------------------------------------------------
> Jiri Olsa (52):
>       perf c2c: Introduce c2c_decode_stats function
>       perf c2c: Introduce c2c_add_stats function
>       perf c2c: Add c2c command
>       perf c2c: Add record subcommand
>       perf c2c: Add report subcommand
>       perf c2c report: Add dimension support
>       perf c2c report: Add sort_entry dimension support
>       perf c2c report: Fallback to standard dimensions
>       perf c2c report: Add sample processing
>       perf c2c report: Add cacheline hists processing
>       perf c2c report: Decode c2c_stats for hist entries
>       perf c2c report: Add header macros
>       perf c2c report: Add 'dcacheline' dimension key
>       perf c2c report: Add 'offset' dimension key
>       perf c2c report: Add 'iaddr' dimension key
>       perf c2c report: Add hitm related dimension keys
>       perf c2c report: Add stores related dimension keys
>       perf c2c report: Add loads related dimension keys
>       perf c2c report: Add llc and remote loads related dimension keys
>       perf c2c report: Add llc load miss dimension key
>       perf c2c report: Add total record sort key
>       perf c2c report: Add total loads sort key
>       perf c2c report: Add hitm percent sort key
>       perf c2c report: Add hitm/store percent related sort keys
>       perf c2c report: Add dram related sort keys
>       perf c2c report: Add 'pid' sort key
>       perf c2c report: Add 'tid' sort key
>       perf c2c report: Add 'symbol' and 'dso' sort keys
>       perf c2c report: Add 'node' sort key
>       perf c2c report: Add stats related sort keys
>       perf c2c report: Add 'cpucnt' sort key
>       perf c2c report: Add src line sort key
>       perf c2c report: Setup number of header lines for hists
>       perf c2c report: Set final resort fields
>       perf c2c report: Add stdio output support
>       perf c2c report: Add main TUI browser
>       perf c2c report: Add TUI cacheline browser
>       perf c2c report: Add global stats stdio output
>       perf c2c report: Add shared cachelines stats stdio output
>       perf c2c report: Add c2c related stats stdio output
>       perf c2c report: Allow to report callchains
>       perf c2c report: Limit the cachelines table entries
>       perf c2c report: Add support to choose local HITMs
>       perf c2c report: Allow to set cacheline sort fields
>       perf c2c report: Recalc width of global sort entries
>       perf c2c report: Add cacheline index entry
>       perf c2c report: Add support to manage symbol name length
>       perf c2c report: Iterate node display in browser
>       perf c2c report: Add help windows
>       perf c2c: Add man page and credits
>       perf c2c report: Add --no-source option
>       perf c2c report: Add --show-all option
> 
>  tools/perf/Build                      |    1 +
>  tools/perf/Documentation/perf-c2c.txt |  282 ++++
>  tools/perf/builtin-c2c.c              | 2754 +++++++++++++++++++++++++++++++++
>  tools/perf/builtin.h                  |    1 +
>  tools/perf/perf.c                     |    1 +
>  tools/perf/ui/browsers/hists.c        |    2 +-
>  tools/perf/ui/browsers/hists.h        |    1 +
>  tools/perf/util/hist.c                |    1 +
>  tools/perf/util/hist.h                |    1 +
>  tools/perf/util/mem-events.c          |  128 ++
>  tools/perf/util/mem-events.h          |   37 +
>  tools/perf/util/sort.c                |    2 +-
>  tools/perf/util/sort.h                |    1 +
>  13 files changed, 3210 insertions(+), 2 deletions(-)
>  create mode 100644 tools/perf/Documentation/perf-c2c.txt
>  create mode 100644 tools/perf/builtin-c2c.c

Pulled the perf-c2c-for-mingo-20161021 tag, thanks a lot Arnaldo!

I can see some teething problems. For example if I run it on an older kernel (v4.4 
distro kernel), I get this:

 triton:~/tip> perf c2c record perf bench sched pipe
 # Running 'sched/pipe' benchmark:
 # Executed 1000000 pipe operations between two processes

     Total time: 12.001 [sec]

      12.001919 usecs/op
          83320 ops/sec
 [ perf record: Woken up 18 times to write data ]
 [ perf record: Captured and wrote 5.356 MB perf.data (69804 samples) ]

but there's no 'perf c2c report' TUI output at all:

 Shared Data Cache Line Table     (0 entries, sorted on remote HITMs)                                                                                                                  
                              Total      Rmt  ----- LLC Load Hitm -----  ---- Store Reference ----  --- Load Dram ----      LLC    Total  ----- Core Load Hit -----  -- LLC Load Hit -
 Index           Cacheline  records     Hitm    Total      Lcl      Rmt    Total    L1Hit   L1Miss       Lcl       Rmt  Ld Miss    Loads       FB       L1       L2       Llc       Rm
                                                                                                                                                                                     
and just an empty screen.

If I do 'perf report' I get two events:

 Available samples
 24K cpu/mem-loads,ldlat=30/P
 45K cpu/mem-stores/P

and both have some real data.

What am I missing?

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ