[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADxym3b=-tGOVqnoPeDb0q3EZZ-DMjkM0fiaSS6=Q+y07azYMg@mail.gmail.com>
Date: Mon, 28 Jul 2025 22:26:27 +0800
From: Menglong Dong <menglong8.dong@...il.com>
To: Masami Hiramatsu <mhiramat@...nel.org>
Cc: alexei.starovoitov@...il.com, rostedt@...dmis.org,
mathieu.desnoyers@...icios.com, hca@...ux.ibm.com, revest@...omium.org,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
bpf@...r.kernel.org
Subject: Re: [PATCH RFC bpf-next v2 0/4] fprobe: use rhashtable for fprobe_ip_table
On Mon, Jul 28, 2025 at 8:35 PM Masami Hiramatsu <mhiramat@...nel.org> wrote:
>
> Hi Menglong,
>
> What are the updates from v1? Just adding RFC?
No, the V1 uses rhashtable, which is wrong, and makes the
function address unique in the hash table.
And in the V2, I use rhltable instead, which supports duplicate
keys.
Sorry that I forgot to add the changelog :/
>
> Thanks,
>
> On Mon, 28 Jul 2025 15:22:49 +0800
> Menglong Dong <menglong8.dong@...il.com> wrote:
>
> > For now, the budget of the hash table that is used for fprobe_ip_table is
> > fixed, which is 256, and can cause huge overhead when the hooked functions
> > is a huge quantity.
> >
> > In this series, we use rhltable for fprobe_ip_table to reduce the
> > overhead.
> >
> > Meanwhile, we also add the benchmark testcase "kprobe-multi-all", which
> > will hook all the kernel functions during the testing. Before this series,
> > the performance is:
> > usermode-count : 875.380 ± 0.366M/s
> > kernel-count : 435.924 ± 0.461M/s
> > syscall-count : 31.004 ± 0.017M/s
> > fentry : 134.076 ± 1.752M/s
> > fexit : 68.319 ± 0.055M/s
> > fmodret : 71.530 ± 0.032M/s
> > rawtp : 202.751 ± 0.138M/s
> > tp : 79.562 ± 0.084M/s
> > kprobe : 55.587 ± 0.028M/s
> > kprobe-multi : 56.481 ± 0.043M/s
> > kprobe-multi-all: 6.283 ± 0.005M/s << look this
> > kretprobe : 22.378 ± 0.028M/s
> > kretprobe-multi: 28.205 ± 0.025M/s
> >
> > With this series, the performance is:
> > usermode-count : 902.387 ± 0.762M/s
> > kernel-count : 427.356 ± 0.368M/s
> > syscall-count : 30.830 ± 0.016M/s
> > fentry : 135.554 ± 0.064M/s
> > fexit : 68.317 ± 0.218M/s
> > fmodret : 70.633 ± 0.275M/s
> > rawtp : 193.404 ± 0.346M/s
> > tp : 80.236 ± 0.068M/s
> > kprobe : 55.200 ± 0.359M/s
> > kprobe-multi : 54.304 ± 0.092M/s
> > kprobe-multi-all: 54.487 ± 0.035M/s << look this
> > kretprobe : 22.381 ± 0.075M/s
> > kretprobe-multi: 27.926 ± 0.034M/s
> >
> > The benchmark of "kprobe-multi-all" increase from 6.283M/s to 54.487M/s.
> >
> > The locking is not handled properly in the first patch. In the
> > fprobe_entry, we should use RCU when we access the rhlist_head. However,
> > we can't use RCU for __fprobe_handler, as it can sleep. In the origin
> > logic, it seems that the usage of hlist_for_each_entry_from_rcu() is not
> > protected by rcu_read_lock neither, isn't it? I don't know how to handle
> > this part ;(
> >
> > Menglong Dong (4):
> > fprobe: use rhltable for fprobe_ip_table
> > selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
> > selftests/bpf: skip recursive functions for kprobe_multi
> > selftests/bpf: add benchmark testing for kprobe-multi-all
> >
> > include/linux/fprobe.h | 2 +-
> > kernel/trace/fprobe.c | 141 ++++++-----
> > tools/testing/selftests/bpf/bench.c | 2 +
> > .../selftests/bpf/benchs/bench_trigger.c | 30 +++
> > .../selftests/bpf/benchs/run_bench_trigger.sh | 2 +-
> > .../bpf/prog_tests/kprobe_multi_test.c | 220 +----------------
> > tools/testing/selftests/bpf/trace_helpers.c | 230 ++++++++++++++++++
> > tools/testing/selftests/bpf/trace_helpers.h | 3 +
> > 8 files changed, 348 insertions(+), 282 deletions(-)
> >
> > --
> > 2.50.1
> >
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@...nel.org>
Powered by blists - more mailing lists