[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170714083340.neiavkoxazrljlos@gmail.com>
Date: Fri, 14 Jul 2017 10:33:40 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Josh Poimboeuf <jpoimboe@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Andres Freund <andres@...razel.de>, x86@...nel.org,
linux-kernel@...r.kernel.org, live-patching@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>, Jiri Slaby <jslaby@...e.cz>,
"H. Peter Anvin" <hpa@...or.com>, Mike Galbraith <efault@....de>,
Jiri Olsa <jolsa@...hat.com>,
Arnaldo Carvalho de Melo <acme@...radead.org>,
Namhyung Kim <namhyung@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>
Subject: Re: [PATCH v3 00/10] x86: ORC unwinder (previously undwarf)
* Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> > > The results wouldn't be 100% accurate, but they could end up being useful
> > > over time.
> >
> > And to expound further on the bad idea, maybe the "bad" addresses could be
> > filtered out somehow in post-processing (insert lots of hand waving).
>
> And some details on the post-processing: in most cases it should be possible to
> determine which of the found stack addresses are valid by looking at the call
> instructions immediately preceding the stack text addresses, and making sure the
> call target points to the same function as the previously found address. But of
> course that wouldn't work for indirect calls.
I believe this is similar to how OProfile did graph/dwarf profiling, by saving a
copy of the stack and post-processing it.
By my best recollection (but I haven't used OProfile that much) it was both a
performance nightmare, was limited (because it only saved a part of the stack),
and was rather fragile as well, because it depended on the task VM being
post-processable.
I think the highest quality implementation is to generate the call trace either in
hardware (LBR), or as close to the event as possible: generate the kernel call
chain in the PMI context, and the user-space call chain before user-space executes
again (at the latest). Call chain generation should be roughly O(chain_depth),
which both FP and ORC ensures.
Thanks,
Ingo
Powered by blists - more mailing lists