lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <vhiptl7wtjmitacwuvtacrzfawixttl7bdblc3ozeyzskqrhid@jjev6i3z3cj7>
Date: Tue, 7 Jan 2025 11:32:33 -0700
From: Daniel Xu <dxu@...uu.xyz>
To: Yafang Shao <laoar.shao@...il.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>, 
	Andrii Nakryiko <andrii@...nel.org>, Eddy Z <eddyz87@...il.com>, Alexei Starovoitov <ast@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, Martin KaFai Lau <martin.lau@...ux.dev>, 
	Song Liu <song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>, 
	John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, 
	Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, 
	Eric Dumazet <edumazet@...gle.com>, bpf <bpf@...r.kernel.org>, 
	Network Development <netdev@...r.kernel.org>
Subject: Re: [RFC PATCH bpf-next 1/2] libbpf: Add support for dynamic
 tracepoint

On Tue, Jan 07, 2025 at 10:41:46AM +0800, Yafang Shao wrote:
> On Tue, Jan 7, 2025 at 6:33 AM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Sun, Jan 5, 2025 at 6:32 PM Yafang Shao <laoar.shao@...il.com> wrote:
> > >
> > > On Mon, Jan 6, 2025 at 8:16 AM Alexei Starovoitov
> > > <alexei.starovoitov@...il.com> wrote:
> > > >
> > > > On Sun, Jan 5, 2025 at 4:44 AM Yafang Shao <laoar.shao@...il.com> wrote:
> > > > >
> > > > > Dynamic tracepoints can be created using debugfs. For example:
> > > > >
> > > > >    echo 'p:myprobe kernel_clone args' >> /sys/kernel/debug/tracing/kprobe_events
> > > > >
> > > > > This command creates a new tracepoint under debugfs:
> > > > >
> > > > >   $ ls /sys/kernel/debug/tracing/events/kprobes/myprobe/
> > > > >   enable  filter  format  hist  id  trigger
> > > > >
> > > > > Although this dynamic tracepoint appears as a tracepoint, it is internally
> > > > > implemented as a kprobe. However, it must be attached as a tracepoint to
> > > > > function correctly in certain contexts.
> > > >
> > > > Nack.
> > > > There are multiple mechanisms to create kprobe/tp via text interfaces.
> > > > We're not going to mix them with the programmatic libbpf api.
> > >
> > > It appears that bpftrace still lacks support for adding a kprobe/tp
> > > and then attaching to it directly. Is that correct?
> >
> > what do you mean?
> 
> Take the inlined kernel function tcp_listendrop() as an example:
> 
> $ perf probe -a 'tcp_listendrop sk'
> Added new events:
>   probe:tcp_listendrop (on tcp_listendrop with sk)
>   probe:tcp_listendrop (on tcp_listendrop with sk)
>   probe:tcp_listendrop (on tcp_listendrop with sk)
>   probe:tcp_listendrop (on tcp_listendrop with sk)
>   probe:tcp_listendrop (on tcp_listendrop with sk)
>   probe:tcp_listendrop (on tcp_listendrop with sk)
>   probe:tcp_listendrop (on tcp_listendrop with sk)
>   probe:tcp_listendrop (on tcp_listendrop with sk)
> 
> You can now use it in all perf tools, such as:
> 
>         perf record -e probe:tcp_listendrop -aR sleep 1

Cool, I'm guessing perf-probe can speak DWARF and will parse all the
inline information.

> 
> Similarly, we can also use bpftrace to trace inlined kernel functions.
> For example:
> 
> - add a dynamic tracepoint
>   $ bpftrace probe -a 'tcp_listendrop sk'
> 
> - trace the dynamic tracepoint
>   $ bpftrace probe -e 'probe:tcp_listendrop {print(args->sk)}'
> 
> > bpftrace supports both kprobe attaching and tp too.
> 
> The dynamic tracepoint is not supported yet.
> 
> >
> > > What do you think about introducing this mechanism into bpftrace? With
> > > such a feature, we could easily attach to inlined kernel functions
> > > using bpftrace.
> >
> > Attaching to inlined funcs also sort-of works. It relies on dwarf,
> > and there is work in progress to add a special section to vmlinux
> > to annotate inlined sites, so it can work without dwarf.
> 
> What’s the benefit of doing this? Why not simply read the DWARF
> information directly from vmlinux?
> 
> $ readelf -S /boot/vmlinux  | grep debug_info
>   [63] .debug_info       PROGBITS         0000000000000000  03e2bc20
> 
> The DWARF information embedded in vmlinux makes it straightforward to
> trace inlined functions without requiring any kernel modifications.
> This approach allows all existing kernel releases to immediately take
> advantage of the functionality, eliminating the need for kernel
> recompilation or patching.

I'd disagree that this approach works with all existing kernels. Kernel
debuginfo is usually not available by default. On some distros, it's not
available at all.

This is particularly relevant for partial inlining - where compiler
inlines some callsites but leaves the symbol in. In these cases, users
trying to probe a symbol will succeed in attaching but then silently
lose events. There is no obvious way for user to know to install
debuginfo. Or to create dynamic tracepoints.

This is the motivation for always-available metadata. Something small
enough where distros can leave it on by default. Similar to motivation
for BTF. There's also overhead involved w/ parsing DWARF. A more compact
representation helps reduce overhead.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ