[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161022082806.GA4526@gmail.com>
Date: Sat, 22 Oct 2016 10:28:06 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: linux-kernel@...r.kernel.org, Linux Weekly News <lwn@....net>,
Andi Kleen <andi@...stfloor.org>,
David Ahern <dsahern@...il.com>,
Don Zickus <dzickus@...hat.com>, Jiri Olsa <jolsa@...nel.org>,
Joe Mario <jmario@...hat.com>,
Namhyung Kim <namhyung@...nel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Arnaldo Carvalho de Melo <acme@...hat.com>
Subject: Re: [GIT PULL 00/52] New Tool: perf c2c
* Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
> Hi Ingo,
>
> Please consider pulling into tip/perf/core,
>
> Thanks,
>
> - Arnaldo
>
> The following changes since commit 10b37cb59fa1e61fec1386f324615e0e8202cd87:
>
> Merge tag 'perf-vendor_events-for-mingo-20161018' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2016-10-19 15:22:26 +0200)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-c2c-for-mingo-20161020
>
> for you to fetch changes up to 535bbde62701b2bb298063e9dfa007e8a1ff95d1:
>
> perf c2c report: Add --show-all option (2016-10-19 13:18:31 -0300)
>
> ----------------------------------------------------------------
> - The 'perf c2c' tool provides means for Shared Data C2C/HITM analysis.
>
> It allows you to track down cacheline contention. The tool is based
> on x86's load latency and precise store facility events provided by
> Intel CPUs.
>
> It was tested by Joe Mario and has proven to be useful, finding some
> cacheline contentions. Joe also wrote a blog about c2c tool with
> examples:
>
> https://joemario.github.io/blog/2016/09/01/c2c-blog/
>
> Excerpt of the content on this site:
>
> ---
> At a high level, “perf c2c” will show you:
>
> * The cachelines where false sharing was detected.
> * The readers and writers to those cachelines, and the offsets where those accesses occurred.
> * The pid, tid, instruction addr, function name, binary object name for those readers and writers.
> * The source file and line number for each reader and writer.
> * The average load latency for the loads to those cachelines.
> * Which numa nodes the samples a cacheline came from and which CPUs were involved.
>
> Using perf c2c is similar to using the Linux perf tool today.
> First collect data with “perf c2c record” Then generate a report output with “perf c2c report”
> ---
>
> There one finds extensive details on using the tool, with tips on
> reducing the volume of samples while still capturing enough to do
> its job. (Dick Fowles, Joe Mario, Don Zickus, Jiri Olsa)
>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@...hat.com>
>
> ----------------------------------------------------------------
> Jiri Olsa (52):
> perf c2c: Introduce c2c_decode_stats function
> perf c2c: Introduce c2c_add_stats function
> perf c2c: Add c2c command
> perf c2c: Add record subcommand
> perf c2c: Add report subcommand
> perf c2c report: Add dimension support
> perf c2c report: Add sort_entry dimension support
> perf c2c report: Fallback to standard dimensions
> perf c2c report: Add sample processing
> perf c2c report: Add cacheline hists processing
> perf c2c report: Decode c2c_stats for hist entries
> perf c2c report: Add header macros
> perf c2c report: Add 'dcacheline' dimension key
> perf c2c report: Add 'offset' dimension key
> perf c2c report: Add 'iaddr' dimension key
> perf c2c report: Add hitm related dimension keys
> perf c2c report: Add stores related dimension keys
> perf c2c report: Add loads related dimension keys
> perf c2c report: Add llc and remote loads related dimension keys
> perf c2c report: Add llc load miss dimension key
> perf c2c report: Add total record sort key
> perf c2c report: Add total loads sort key
> perf c2c report: Add hitm percent sort key
> perf c2c report: Add hitm/store percent related sort keys
> perf c2c report: Add dram related sort keys
> perf c2c report: Add 'pid' sort key
> perf c2c report: Add 'tid' sort key
> perf c2c report: Add 'symbol' and 'dso' sort keys
> perf c2c report: Add 'node' sort key
> perf c2c report: Add stats related sort keys
> perf c2c report: Add 'cpucnt' sort key
> perf c2c report: Add src line sort key
> perf c2c report: Setup number of header lines for hists
> perf c2c report: Set final resort fields
> perf c2c report: Add stdio output support
> perf c2c report: Add main TUI browser
> perf c2c report: Add TUI cacheline browser
> perf c2c report: Add global stats stdio output
> perf c2c report: Add shared cachelines stats stdio output
> perf c2c report: Add c2c related stats stdio output
> perf c2c report: Allow to report callchains
> perf c2c report: Limit the cachelines table entries
> perf c2c report: Add support to choose local HITMs
> perf c2c report: Allow to set cacheline sort fields
> perf c2c report: Recalc width of global sort entries
> perf c2c report: Add cacheline index entry
> perf c2c report: Add support to manage symbol name length
> perf c2c report: Iterate node display in browser
> perf c2c report: Add help windows
> perf c2c: Add man page and credits
> perf c2c report: Add --no-source option
> perf c2c report: Add --show-all option
>
> tools/perf/Build | 1 +
> tools/perf/Documentation/perf-c2c.txt | 282 ++++
> tools/perf/builtin-c2c.c | 2754 +++++++++++++++++++++++++++++++++
> tools/perf/builtin.h | 1 +
> tools/perf/perf.c | 1 +
> tools/perf/ui/browsers/hists.c | 2 +-
> tools/perf/ui/browsers/hists.h | 1 +
> tools/perf/util/hist.c | 1 +
> tools/perf/util/hist.h | 1 +
> tools/perf/util/mem-events.c | 128 ++
> tools/perf/util/mem-events.h | 37 +
> tools/perf/util/sort.c | 2 +-
> tools/perf/util/sort.h | 1 +
> 13 files changed, 3210 insertions(+), 2 deletions(-)
> create mode 100644 tools/perf/Documentation/perf-c2c.txt
> create mode 100644 tools/perf/builtin-c2c.c
Pulled the perf-c2c-for-mingo-20161021 tag, thanks a lot Arnaldo!
I can see some teething problems. For example if I run it on an older kernel (v4.4
distro kernel), I get this:
triton:~/tip> perf c2c record perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 12.001 [sec]
12.001919 usecs/op
83320 ops/sec
[ perf record: Woken up 18 times to write data ]
[ perf record: Captured and wrote 5.356 MB perf.data (69804 samples) ]
but there's no 'perf c2c report' TUI output at all:
Shared Data Cache Line Table (0 entries, sorted on remote HITMs)
Total Rmt ----- LLC Load Hitm ----- ---- Store Reference ---- --- Load Dram ---- LLC Total ----- Core Load Hit ----- -- LLC Load Hit -
Index Cacheline records Hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2 Llc Rm
and just an empty screen.
If I do 'perf report' I get two events:
Available samples
24K cpu/mem-loads,ldlat=30/P
45K cpu/mem-stores/P
and both have some real data.
What am I missing?
Ingo
Powered by blists - more mailing lists