[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YjiuoEUL6jH32cBi@google.com>
Date: Mon, 21 Mar 2022 17:58:08 +0100
From: "Steinar H. Gunderson" <sesse@...gle.com>
To: Adrian Hunter <adrian.hunter@...el.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Namhyung Kim <namhyung@...nel.org>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] perf intel-pt: Synthesize cycle events
On Mon, Mar 21, 2022 at 03:09:08PM +0200, Adrian Hunter wrote:
> Yes, it can cross calls and returns. 'returns' due to "Return Compression"
> which can be switched off at record time with config term noretcomp, but
> that may cause more overflows / trace data loss.
>
> To get accurate times for a single function there is Intel PT
> address filtering.
>
> Otherwise LBRs can have cycle times.
Many interesting points, I'll be sure to look into them.
Meanwhile, should I send a new patch with your latest changes?
It's more your patch than mine now, it seems, but I think we're
converging on something useful.
>> By the way, I noticed that synthesized call stacks do not respect
>> --inline; is that on purpose? The patch seems simple enough (just
>> a call to add_inlines), although it exposes extreme slowness in libbfd
>> when run over large binaries, which I'll have to look into.
>> (10+ ms for each address-to-symbol lookup is rather expensive when you
>> have 4M samples to churn through!)
> No, not on purpose.
The patch appears to be trivial:
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -44,6 +44,7 @@
#include <linux/zalloc.h>
static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
+static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
static struct dso *machine__kernel_dso(struct machine *machine)
{
@@ -2217,6 +2218,10 @@ static int add_callchain_ip(struct thread *thread,
ms.maps = al.maps;
ms.map = al.map;
ms.sym = al.sym;
+
+ if (append_inlines(cursor, &ms, ip) == 0)
+ return 0;
+
srcline = callchain_srcline(&ms, al.addr);
return callchain_cursor_append(cursor, ip, &ms,
branch, flags, nr_loop_iter,
but I'm seeing problems with “failing to process sample” when I go from
10us to 1us, so I'll have to look into that.
I've sent some libbfd patches try to help the slowness:
https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=30cbd32aec30b4bc13427bbd87c4c63c739d4578
https://sourceware.org/pipermail/binutils/2022-March/120131.html
https://sourceware.org/pipermail/binutils/2022-March/120133.html
/* Steinar */
Powered by blists - more mailing lists