[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADxym3brzU=npXwSNUA7x1bCwyyyqgR49LwUzgxeka6ss6Jzrw@mail.gmail.com>
Date: Mon, 28 Jul 2025 22:36:42 +0800
From: Menglong Dong <menglong8.dong@...il.com>
To: Jiri Olsa <olsajiri@...il.com>
Cc: alexei.starovoitov@...il.com, mhiramat@...nel.org, rostedt@...dmis.org,
mathieu.desnoyers@...icios.com, hca@...ux.ibm.com, revest@...omium.org,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table
On Mon, Jul 28, 2025 at 9:14 PM Jiri Olsa <olsajiri@...il.com> wrote:
>
> On Mon, Jul 28, 2025 at 12:12:47PM +0800, Menglong Dong wrote:
> > For now, the budget of the hash table that is used for fprobe_ip_table is
> > fixed, which is 256, and can cause huge overhead when the hooked functions
> > is a huge quantity.
> >
> > In this series, we use rhashtable for fprobe_ip_table to reduce the
> > overhead.
> >
> > Meanwhile, we also add the benchmark testcase "kprobe-multi-all", which
> > will hook all the kernel functions during the testing. Before this series,
> > the performance is:
> > usermode-count : 875.380 ± 0.366M/s
> > kernel-count : 435.924 ± 0.461M/s
> > syscall-count : 31.004 ± 0.017M/s
> > fentry : 134.076 ± 1.752M/s
> > fexit : 68.319 ± 0.055M/s
> > fmodret : 71.530 ± 0.032M/s
> > rawtp : 202.751 ± 0.138M/s
> > tp : 79.562 ± 0.084M/s
> > kprobe : 55.587 ± 0.028M/s
> > kprobe-multi : 56.481 ± 0.043M/s
> > kprobe-multi-all: 6.283 ± 0.005M/s << look this
> > kretprobe : 22.378 ± 0.028M/s
> > kretprobe-multi: 28.205 ± 0.025M/s
> >
> > With this series, the performance is:
> > usermode-count : 897.083 ± 5.347M/s
> > kernel-count : 431.638 ± 1.781M/s
> > syscall-count : 30.807 ± 0.057M/s
> > fentry : 134.803 ± 1.045M/s
> > fexit : 68.763 ± 0.018M/s
> > fmodret : 71.444 ± 0.052M/s
> > rawtp : 202.344 ± 0.149M/s
> > tp : 79.644 ± 0.376M/s
> > kprobe : 55.480 ± 0.108M/s
> > kprobe-multi : 57.302 ± 0.119M/s
> > kprobe-multi-all: 57.855 ± 0.144M/s << look this
>
> nice, so the we still trigger one function, but having all possible
> functions attached, right?
Yes. The test case can be improved further. For now,
I attach the prog bench_trigger_kprobe_multi to all the kernel
functions and triggers the benchmark. There can be some noise,
as all the kernel function calling can increase the benchmark
results. However, it will not make much difference.
A better choice will be: attach an empty kprobe_multi prog to
all the kernel functions except bpf_get_numa_node_id, and
attach bench_trigger_kprobe_multi to bpf_get_numa_node_id,
which can make the results more accurate.
>
> thanks,
> jirka
>
>
> > kretprobe : 22.265 ± 0.023M/s
> > kretprobe-multi: 27.740 ± 0.023M/s
> >
> > The benchmark of "kprobe-multi-all" increase from 6.283M/s to 57.855M/s.
> >
> > Menglong Dong (4):
> > fprobe: use rhashtable
> > selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
> > selftests/bpf: add benchmark testing for kprobe-multi-all
> > selftests/bpf: skip recursive functions for kprobe_multi
> >
> > include/linux/fprobe.h | 2 +-
> > kernel/trace/fprobe.c | 144 ++++++-----
> > tools/testing/selftests/bpf/bench.c | 2 +
> > .../selftests/bpf/benchs/bench_trigger.c | 30 +++
> > .../selftests/bpf/benchs/run_bench_trigger.sh | 2 +-
> > .../bpf/prog_tests/kprobe_multi_test.c | 220 +----------------
> > tools/testing/selftests/bpf/trace_helpers.c | 230 ++++++++++++++++++
> > tools/testing/selftests/bpf/trace_helpers.h | 3 +
> > 8 files changed, 351 insertions(+), 282 deletions(-)
> >
> > --
> > 2.50.1
> >
> >
Powered by blists - more mailing lists