[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160929091912.GV5012@twins.programming.kicks-ass.net>
Date: Thu, 29 Sep 2016 11:19:12 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Jiri Olsa <jolsa@...nel.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
Michael Trapp <michael.trapp@....com>,
"Long, Wai Man" <waiman.long@....com>,
Stanislav Ievlev <stanislav.ievlev@...il.com>,
Kim Phillips <kim.phillips@....com>,
lkml <linux-kernel@...r.kernel.org>,
Don Zickus <dzickus@...hat.com>, Joe Mario <jmario@...hat.com>,
Ingo Molnar <mingo@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
David Ahern <dsahern@...il.com>,
Andi Kleen <andi@...stfloor.org>,
Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCHv4 00/57] perf c2c: Add new tool to analyze cacheline
contention on NUMA systems
On Thu, Sep 22, 2016 at 05:36:28PM +0200, Jiri Olsa wrote:
> hi,
> sending new version of c2c patches (v3) originally posted in here:
> http://lwn.net/Articles/588866/
>
> I took the old set and reworked it to fit into current upstream code.
> It follows the same logic as original patch and provides (almost) the
> same stdio interface. In addition new TUI interface was added.
>
> The perf c2c tool provides means for Shared Data C2C/HITM analysis.
> It allows you to track down the cacheline contentions. The tool is
> based on x86's load latency and precise store facility events provided
> by Intel CPUs.
>
> The tool was tested by Joe Mario and has proven to be useful and found
> some cachelines contentions. Joe also wrote a blog about c2c tool with
> examples located in here:
>
> https://joemario.github.io/blog/2016/09/01/c2c-blog/
>
> v4 changes:
> - 4 patches already queued
> - used u32 for c2c_stats instead of int [Stanislav]
> - fixed NO_SLANG=1 compilation [Kim]
> - add __hist_entry__snprintf helper [Arnaldo]
>
> Code is also available in:
> git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> perf/c2c_v4
>
> Testing:
> $ perf c2c record -a [workload]
> $ perf c2c report [--stdio]
> $ man perf-c2c
>
> It's most likely you won't generate any remote HITMs on common
> laptops, so to get results for local HITMs please use:
>
> $ perf c2c report -d lcl [--stdio]
I'll just keep repeating; this is not the tool I want :-( I'll not block
this tool, but I also think its far less usable than it should've been.
https://lkml.kernel.org/r/20151209093402.GM6356@twins.programming.kicks-ass.net
What I want is a tool that maps memop events (any PEBS memops) back to a
'type::member' form and sorts on that. That doesn't rely on the PEBS
'Data Linear Address' field, as that is useless for dynamically
allocated bits. Instead it would use the IP and Dwarf information to
deduce the 'type::member' of the memop.
I want pahole like output, showing me where the hits (green) and misses
(red) are in a structure.
I want to be able to 'perf memops report -EC task_struct' and see the
expanded task_struct (as per 'pahole -EC task_struct') annotated, not a
data address for each task in my workload (which could be 100+ and
entirely useless).
Currently this is somewhat involved, since Dwarf doesn't include type
information for all memops, so we'd have to disassemble and interpret,
which while tedious is possible.
However, afaik, Stephane has been working with their tools team to get
additional DWARF info to make this easier. Stephane, any updates on
that?
Powered by blists - more mailing lists