[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAD=FV=U1ChMU5dT4kBSTcsoePo7VUEQuoYtRUJuHBdVcgbS0XA@mail.gmail.com>
Date: Mon, 7 Jun 2021 13:57:19 -0700
From: Doug Anderson <dianders@...omium.org>
To: James Clark <james.clark@....com>
Cc: Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Seth LaForge <sethml@...gle.com>,
Ricky Liang <jcliang@...omium.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Ingo Molnar <mingo@...hat.com>, Jiri Olsa <jolsa@...hat.com>,
Mark Rutland <mark.rutland@....com>,
Namhyung Kim <namhyung@...nel.org>,
Nathan Chancellor <nathan@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
clang-built-linux <clang-built-linux@...glegroups.com>,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-perf-users@...r.kernel.org,
Alexandre Truong <alexandre.truong@....com>,
Wilco Dijkstra <wilco.dijkstra@....com>,
Al Grant <Al.Grant@....com>
Subject: Re: [PATCH 2/3] arm64: perf: Improve compat perf_callchain_user() for
clang leaf functions
Hi,
On Mon, Jun 7, 2021 at 2:14 AM James Clark <james.clark@....com> wrote:
>
>
>
> On 07/05/2021 23:55, Douglas Anderson wrote:
> > It turns out that even when you compile code with clang with
> > "-fno-omit-frame-pointer" that it won't generate a frame pointer for
> > leaf functions (those that don't call any sub-functions). Presumably
> > clang does this to reduce the overhead of frame pointers. In a leaf
> > function you don't really need frame pointers since the Link Register
> > (LR) is guaranteed to always point to the caller>
> [...]
> >
> > arch/arm64/kernel/perf_callchain.c | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
> > index e5ce5f7965d1..b3cd9f371469 100644
> > --- a/arch/arm64/kernel/perf_callchain.c
> > +++ b/arch/arm64/kernel/perf_callchain.c
> > @@ -326,6 +326,20 @@ static void compat_perf_callchain_user(struct perf_callchain_entry_ctx *entry,
> > while ((entry->nr < entry->max_stack) && fp && !(fp & 0x3)) {
> > err = compat_perf_trace_1(&fp, &pc, leaf_lr);
> >
> > + /*
> > + * If this is the first trace and it didn't find the LR then
> > + * let's throw it in the trace first. This isn't perfect but
> > + * is the best we can do for handling clang leaf functions (or
> > + * the case where we're right at the start of the function
> > + * before the new frame has been pushed). In the worst case
> > + * this can cause us to throw an extra entry that will be some
> > + * location in the same function as the PC. That's not
> > + * amazing but shouldn't really hurt. It seems better than
> > + * throwing away the LR.
> > + */
>
> Hi Douglas,
>
> I think the behaviour with GCC is also similar. We were working on this change
> (https://lore.kernel.org/lkml/20210304163255.10363-4-alexandre.truong@arm.com/)
> in userspace Perf which addresses the same issue.
>
> The basic concept of our version is to record only the link register
> (as in --user-regs=lr). Then use the existing dwarf based unwind
> to determine if the link register is valid for that frame, and then if
> it is and it doesn't already exist on the stack then insert it.
>
> You mention that your version isn't perfect, do you think that saving the
> LR and using something like libunwind in a post process could be better?
Using post processing atop a patch to always save the LR is definitely
the right solution IMO and (I think) you can fully overcome the "no
frame pointers in leaf functions" with the post processing.
-Doug
Powered by blists - more mailing lists