lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aIkLlB7Z7V--BeGi@J2N7QTR9R3.cambridge.arm.com>
Date: Tue, 29 Jul 2025 18:57:40 +0100
From: Mark Rutland <mark.rutland@....com>
To: Jiri Olsa <jolsa@...nel.org>
Cc: Steven Rostedt <rostedt@...nel.org>, Florent Revest <revest@...gle.com>,
	bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-trace-kernel@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org,
	Alexei Starovoitov <ast@...nel.org>,
	Daniel Borkmann <daniel@...earbox.net>,
	Andrii Nakryiko <andrii@...nel.org>,
	Menglong Dong <menglong8.dong@...il.com>,
	Naveen N Rao <naveen@...nel.org>,
	Michael Ellerman <mpe@...erman.id.au>,
	Björn Töpel <bjorn@...osinc.com>,
	Andy Chiu <andybnac@...il.com>,
	Alexandre Ghiti <alexghiti@...osinc.com>,
	Palmer Dabbelt <palmer@...belt.com>
Subject: Re: [RFC 00/10] ftrace,bpf: Use single direct ops for bpf trampolines

Hi Jiri,

[adding some powerpc and riscv folk, see below]

On Tue, Jul 29, 2025 at 12:28:03PM +0200, Jiri Olsa wrote:
> hi,
> while poking the multi-tracing interface I ended up with just one
> ftrace_ops object to attach all trampolines.
> 
> This change allows to use less direct API calls during the attachment
> changes in the future code, so in effect speeding up the attachment.

How important is that, and what sort of speedup does this result in? I
ask due to potential performance hits noted below, and I'm lacking
context as to why we want to do this in the first place -- what is this
intended to enable/improve?

> However having just single ftrace_ops object removes direct_call
> field from direct_call, which is needed by arm, so I'm not sure
> it's the right path forward.

It's also needed by powerpc and riscv since commits:

  a52f6043a2238d65 ("powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS")
  b21cdb9523e5561b ("riscv: ftrace: support direct call using call_ops")

> Mark, Florent,
> any idea how hard would it be to for arm to get rid of direct_call field?

For architectures which follow the arm64 style of implementation, it's
pretty hard to get rid of it without introducing a performance hit to
the call and/or a hit to attachment/detachment/modification. It would
also end up being a fair amount more complicated.

There's some historical rationale at:

  https://lore.kernel.org/lkml/ZfBbxPDd0rz6FN2T@FVFF77S0Q05N/

... but the gist is that for several reasons we want the ops pointer in
the callsite, and for atomic modification of this to switch everything
dependent on that ops atomically, as this keeps the call logic and
attachment/detachment/modification logic simple and pretty fast.

If we remove the direct_call pointer from the ftrace_ops, then IIUC our
options include:

* Point the callsite pointer at some intermediate structure which points
  to the ops (e.g. the dyn_ftrace for the callsite). That introduces an
  additional dependent load per call that needs the ops, and introduces
  potential incoherency with other fields in that structure, requiring
  more synchronization overhead for attachment/detachment/modification.

* Point the callsite pointer at a trampoline which can generate the ops
  pointer. This requires that every ops has a trampoline even for
  non-direct usage, which then requires more memory / I$, has more
  potential failure points, and is generally more complicated. The
  performance here will vary by architecture and platform, on some this
  might be faster, on some it might be slower.

  Note that we probably still need to bounce through an intermediary
  trampoline here to actually load from the callsite pointer and
  indirectly branch to it.

... but I'm not really keen on either unless we really have to remove 
the ftrace_ops::direct_call field, since they come with a substantial
jump in complexity.

Mark.


> 
> thougts? thanks,
> jirka
> 
> 
> ---
> Jiri Olsa (10):
>       ftrace: Make alloc_and_copy_ftrace_hash direct friendly
>       ftrace: Add register_ftrace_direct_hash function
>       ftrace: Add unregister_ftrace_direct_hash function
>       ftrace: Add modify_ftrace_direct_hash function
>       ftrace: Export some of hash related functions
>       ftrace: Use direct hash interface in direct functions
>       bpf: Add trampoline ip hash table
>       ftrace: Factor ftrace_ops ops_func interface
>       bpf: Remove ftrace_ops from bpf_trampoline object
>       Revert "ftrace: Store direct called addresses in their ops"
> 
>  include/linux/bpf.h           |   8 +-
>  include/linux/ftrace.h        |  51 ++++++++++---
>  kernel/bpf/trampoline.c       |  94 +++++++++++++-----------
>  kernel/trace/ftrace.c         | 481 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------------------
>  kernel/trace/trace.h          |   8 --
>  kernel/trace/trace_selftest.c |   5 +-
>  6 files changed, 395 insertions(+), 252 deletions(-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ