[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201008070231.GS2628@hirez.programming.kicks-ass.net>
Date: Thu, 8 Oct 2020 09:02:31 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Stephane Eranian <eranian@...gle.com>
Cc: linux-toolchains@...r.kernel.org,
Arnaldo Carvalho de Melo <acme@...nel.org>,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Ian Rogers <irogers@...gle.com>,
"Phillips, Kim" <kim.phillips@....com>,
Mark Rutland <mark.rutland@....com>,
Andi Kleen <andi@...stfloor.org>,
Masami Hiramatsu <mhiramat@...nel.org>
Subject: Re: Additional debug info to aid cacheline analysis
My appologies for adding a typo to the linux-kernel address, corrected
now.
On Wed, Oct 07, 2020 at 10:58:00PM -0700, Stephane Eranian wrote:
> Hi Peter,
>
> On Tue, Oct 6, 2020 at 6:17 AM Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > Hi all,
> >
> > I've been trying to float this idea for a fair number of years, and I
> > think at least Stephane has been talking to tools people about it, but
> > I'm not sure what, if anything, ever happened with it, so let me post it
> > here :-)
> >
> >
> Thanks for bringing this back. This is a pet project of mine and I
> have been looking at it for the last 4 years intermittently now.
> Simply never got a chance to complete because preempted by other
> higher priority projects. I have developed an internal
> proof-of-concept prototype using one of the 3 approaches I know. My
> goal was to demonstrate that PMU statistical sampling of loads/stores
> and with data addresses would work as well as instrumentation. This is
> slightly different from hit/miss in the analysis but the process is
> the same.
>
> As you point out, the difficulty is not so much in collecting the
> sample but rather in symbolizing data addresses from the heap.
Right, that's non-trivial, although for static and per-cpu objects it
should be rather straight forward, heap objects are going to be a pain.
You'd basically have to also log the alloc/free of every object along
with the data type used for it, which is not something we have readily
abailable at the allocator.
> Intel PEBS, IBM Marked Events work well to collect the data. AMD IBS
> works though you get a lot of irrelevant samples due to lack of
> hardware filtering. ARM SPE would work too. Overall, all the major
> architectures will provide the sampling support needed.
That's for the data address, or also the eventing IP?
> Some time ago, I had my intern pursue the other 2 approaches for
> symbolization. The one I see as most promising is by using the DWARF
> information (no BPF needed). The good news is that I believe we do not
> need more information than what is already there. We just need the
> compiler to generate valid DWARF at most optimization levels, which I
> believe is not the case for LLVM based compilers but maybe okay for
> GCC.
Right, I think GCC improved a lot on this front over the past few years.
Also added Andi and Masami, who have worked on this or related topics.
> Once we have the DWARF logic in place then it is easier to improve
> perf report/annotate do to hit/miss or hot/cold, read/write analysis
> on each data type and fields within.
>
> Once we have the code for perf, we are planning to contribute it upstream.
>
> In the meantime, we need to lean on the compiler teams to ensure no
> data type information is lost with high optimizations levels. My
> understanding from talking with some compiler folks is that this is
> not a trivial fix.
As you might have noticed, I send this to the linux-toolchains list.
While you lean on your copmiler folks, try and get them subscribed to
this list. It is meant to discuss toolchain issues as related to Linux.
Both GCC/binutils and LLVM should be represented here.
Powered by blists - more mailing lists