[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240430223217.fd375d57d130a4207be18e94@kernel.org>
Date: Tue, 30 Apr 2024 22:32:17 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>, Steven Rostedt
<rostedt@...dmis.org>, Florent Revest <revest@...omium.org>,
linux-trace-kernel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, bpf <bpf@...r.kernel.org>, Sven
Schnelle <svens@...ux.ibm.com>, Alexei Starovoitov <ast@...nel.org>, Jiri
Olsa <jolsa@...nel.org>, Arnaldo Carvalho de Melo <acme@...nel.org>, Daniel
Borkmann <daniel@...earbox.net>, Alan Maguire <alan.maguire@...cle.com>,
Mark Rutland <mark.rutland@....com>, Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>, Guo Ren <guoren@...nel.org>
Subject: Re: [PATCH v9 00/36] tracing: fprobe: function_graph:
Multi-function graph and fprobe on fgraph
On Mon, 29 Apr 2024 13:25:04 -0700
Andrii Nakryiko <andrii.nakryiko@...il.com> wrote:
> On Mon, Apr 29, 2024 at 6:51 AM Masami Hiramatsu <mhiramat@...nel.org> wrote:
> >
> > Hi Andrii,
> >
> > On Thu, 25 Apr 2024 13:31:53 -0700
> > Andrii Nakryiko <andrii.nakryiko@...il.com> wrote:
> >
> > > Hey Masami,
> > >
> > > I can't really review most of that code as I'm completely unfamiliar
> > > with all those inner workings of fprobe/ftrace/function_graph. I left
> > > a few comments where there were somewhat more obvious BPF-related
> > > pieces.
> > >
> > > But I also did run our BPF benchmarks on probes/for-next as a baseline
> > > and then with your series applied on top. Just to see if there are any
> > > regressions. I think it will be a useful data point for you.
> >
> > Thanks for testing!
> >
> > >
> > > You should be already familiar with the bench tool we have in BPF
> > > selftests (I used it on some other patches for your tree).
> >
> > What patches we need?
> >
>
> You mean for this `bench` tool? They are part of BPF selftests (under
> tools/testing/selftests/bpf), you can build them by running:
>
> $ make RELEASE=1 -j$(nproc) bench
>
> After that you'll get a self-container `bench` binary, which has all
> the self-contained benchmarks.
>
> You might also find a small script (benchs/run_bench_trigger.sh inside
> BPF selftests directory) helpful, it collects final summary of the
> benchmark run and optionally accepts a specific set of benchmarks. So
> you can use it like this:
>
> $ benchs/run_bench_trigger.sh kprobe kprobe-multi
> kprobe : 18.731 ± 0.639M/s
> kprobe-multi : 23.938 ± 0.612M/s
>
> By default it will run a wider set of benchmarks (no uprobes, but a
> bunch of extra fentry/fexit tests and stuff like this).
origin:
# benchs/run_bench_trigger.sh
kretprobe : 1.329 ± 0.007M/s
kretprobe-multi: 1.341 ± 0.004M/s
# benchs/run_bench_trigger.sh
kretprobe : 1.288 ± 0.014M/s
kretprobe-multi: 1.365 ± 0.002M/s
# benchs/run_bench_trigger.sh
kretprobe : 1.329 ± 0.002M/s
kretprobe-multi: 1.331 ± 0.011M/s
# benchs/run_bench_trigger.sh
kretprobe : 1.311 ± 0.003M/s
kretprobe-multi: 1.318 ± 0.002M/s s
patched:
# benchs/run_bench_trigger.sh
kretprobe : 1.274 ± 0.003M/s
kretprobe-multi: 1.397 ± 0.002M/s
# benchs/run_bench_trigger.sh
kretprobe : 1.307 ± 0.002M/s
kretprobe-multi: 1.406 ± 0.004M/s
# benchs/run_bench_trigger.sh
kretprobe : 1.279 ± 0.004M/s
kretprobe-multi: 1.330 ± 0.014M/s
# benchs/run_bench_trigger.sh
kretprobe : 1.256 ± 0.010M/s
kretprobe-multi: 1.412 ± 0.003M/s
Hmm, in my case, it seems smaller differences (~3%?).
I attached perf report results for those, but I don't see large difference.
> > >
> > > BASELINE
> > > ========
> > > kprobe : 24.634 ± 0.205M/s
> > > kprobe-multi : 28.898 ± 0.531M/s
> > > kretprobe : 10.478 ± 0.015M/s
> > > kretprobe-multi: 11.012 ± 0.063M/s
> > >
> > > THIS PATCH SET ON TOP
> > > =====================
> > > kprobe : 25.144 ± 0.027M/s (+2%)
> > > kprobe-multi : 28.909 ± 0.074M/s
> > > kretprobe : 9.482 ± 0.008M/s (-9.5%)
> > > kretprobe-multi: 13.688 ± 0.027M/s (+24%)
> >
> > This looks good. Kretprobe should also use kretprobe-multi (fprobe)
> > eventually because it should be a single callback version of
> > kretprobe-multi.
I ran another benchmark (prctl loop, attached), the origin kernel result is here;
# sh ./benchmark.sh
count = 10000000, took 6.748133 sec
And the patched kernel result;
# sh ./benchmark.sh
count = 10000000, took 6.644095 sec
I confirmed that the parf result has no big difference.
Thank you,
> >
> > >
> > > These numbers are pretty stable and look to be more or less representative.
> > >
> > > As you can see, kprobes got a bit faster, kprobe-multi seems to be
> > > about the same, though.
> > >
> > > Then (I suppose they are "legacy") kretprobes got quite noticeably
> > > slower, almost by 10%. Not sure why, but looks real after re-running
> > > benchmarks a bunch of times and getting stable results.
> >
> > Hmm, kretprobe on x86 should use ftrace + rethook even with my series.
> > So nothing should be changed. Maybe cache access pattern has been
> > changed?
> > I'll check it with tracefs (to remove the effect from bpf related changes)
> >
> > >
> > > On the other hand, multi-kretprobes got significantly faster (+24%!).
> > > Again, I don't know if it is expected or not, but it's a nice
> > > improvement.
> >
> > Thanks!
> >
> > >
> > > If you have any idea why kretprobes would get so much slower, it would
> > > be nice to look into that and see if you can mitigate the regression
> > > somehow. Thanks!
> >
> > OK, let me check it.
> >
> > Thank you!
> >
> > >
> > >
> > > > 51 files changed, 2325 insertions(+), 882 deletions(-)
> > > > create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
> > > >
> > > > --
> > > > Masami Hiramatsu (Google) <mhiramat@...nel.org>
> > > >
> >
> >
> > --
> > Masami Hiramatsu (Google) <mhiramat@...nel.org>
--
Masami Hiramatsu (Google) <mhiramat@...nel.org>
View attachment "prctl_loop.c" of type "text/x-csrc" (555 bytes)
View attachment "perf-out-kretprobe-nopatch.txt" of type "text/plain" (65382 bytes)
View attachment "perf-out-kretprobe-patched.txt" of type "text/plain" (66043 bytes)
Powered by blists - more mailing lists