[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aIkLlB7Z7V--BeGi@J2N7QTR9R3.cambridge.arm.com>
Date: Tue, 29 Jul 2025 18:57:40 +0100
From: Mark Rutland <mark.rutland@....com>
To: Jiri Olsa <jolsa@...nel.org>
Cc: Steven Rostedt <rostedt@...nel.org>, Florent Revest <revest@...gle.com>,
bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Menglong Dong <menglong8.dong@...il.com>,
Naveen N Rao <naveen@...nel.org>,
Michael Ellerman <mpe@...erman.id.au>,
Björn Töpel <bjorn@...osinc.com>,
Andy Chiu <andybnac@...il.com>,
Alexandre Ghiti <alexghiti@...osinc.com>,
Palmer Dabbelt <palmer@...belt.com>
Subject: Re: [RFC 00/10] ftrace,bpf: Use single direct ops for bpf trampolines
Hi Jiri,
[adding some powerpc and riscv folk, see below]
On Tue, Jul 29, 2025 at 12:28:03PM +0200, Jiri Olsa wrote:
> hi,
> while poking the multi-tracing interface I ended up with just one
> ftrace_ops object to attach all trampolines.
>
> This change allows to use less direct API calls during the attachment
> changes in the future code, so in effect speeding up the attachment.
How important is that, and what sort of speedup does this result in? I
ask due to potential performance hits noted below, and I'm lacking
context as to why we want to do this in the first place -- what is this
intended to enable/improve?
> However having just single ftrace_ops object removes direct_call
> field from direct_call, which is needed by arm, so I'm not sure
> it's the right path forward.
It's also needed by powerpc and riscv since commits:
a52f6043a2238d65 ("powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS")
b21cdb9523e5561b ("riscv: ftrace: support direct call using call_ops")
> Mark, Florent,
> any idea how hard would it be to for arm to get rid of direct_call field?
For architectures which follow the arm64 style of implementation, it's
pretty hard to get rid of it without introducing a performance hit to
the call and/or a hit to attachment/detachment/modification. It would
also end up being a fair amount more complicated.
There's some historical rationale at:
https://lore.kernel.org/lkml/ZfBbxPDd0rz6FN2T@FVFF77S0Q05N/
... but the gist is that for several reasons we want the ops pointer in
the callsite, and for atomic modification of this to switch everything
dependent on that ops atomically, as this keeps the call logic and
attachment/detachment/modification logic simple and pretty fast.
If we remove the direct_call pointer from the ftrace_ops, then IIUC our
options include:
* Point the callsite pointer at some intermediate structure which points
to the ops (e.g. the dyn_ftrace for the callsite). That introduces an
additional dependent load per call that needs the ops, and introduces
potential incoherency with other fields in that structure, requiring
more synchronization overhead for attachment/detachment/modification.
* Point the callsite pointer at a trampoline which can generate the ops
pointer. This requires that every ops has a trampoline even for
non-direct usage, which then requires more memory / I$, has more
potential failure points, and is generally more complicated. The
performance here will vary by architecture and platform, on some this
might be faster, on some it might be slower.
Note that we probably still need to bounce through an intermediary
trampoline here to actually load from the callsite pointer and
indirectly branch to it.
... but I'm not really keen on either unless we really have to remove
the ftrace_ops::direct_call field, since they come with a substantial
jump in complexity.
Mark.
>
> thougts? thanks,
> jirka
>
>
> ---
> Jiri Olsa (10):
> ftrace: Make alloc_and_copy_ftrace_hash direct friendly
> ftrace: Add register_ftrace_direct_hash function
> ftrace: Add unregister_ftrace_direct_hash function
> ftrace: Add modify_ftrace_direct_hash function
> ftrace: Export some of hash related functions
> ftrace: Use direct hash interface in direct functions
> bpf: Add trampoline ip hash table
> ftrace: Factor ftrace_ops ops_func interface
> bpf: Remove ftrace_ops from bpf_trampoline object
> Revert "ftrace: Store direct called addresses in their ops"
>
> include/linux/bpf.h | 8 +-
> include/linux/ftrace.h | 51 ++++++++++---
> kernel/bpf/trampoline.c | 94 +++++++++++++-----------
> kernel/trace/ftrace.c | 481 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------------------
> kernel/trace/trace.h | 8 --
> kernel/trace/trace_selftest.c | 5 +-
> 6 files changed, 395 insertions(+), 252 deletions(-)
Powered by blists - more mailing lists