[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEf4BzbuQxEWfTNRq9163Gi=SMDi3wCpfp+NEMVtz_BRYJxOdg@mail.gmail.com>
Date: Tue, 14 Jan 2025 11:04:35 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Jiri Olsa <olsajiri@...il.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>, Alexei Starovoitov <alexei.starovoitov@...il.com>,
Steven Rostedt <rostedt@...dmis.org>, Florent Revest <revest@...omium.org>,
linux-trace-kernel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, bpf <bpf@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>, Alan Maguire <alan.maguire@...cle.com>,
Mark Rutland <mark.rutland@....com>, linux-arch@...r.kernel.org
Subject: Re: [PATCH v22 00/20] tracing: fprobe: function_graph: Multi-function
graph and fprobe on fgraph
On Tue, Jan 14, 2025 at 7:12 AM Jiri Olsa <olsajiri@...il.com> wrote:
>
> On Fri, Jan 10, 2025 at 04:04:37PM -0800, Andrii Nakryiko wrote:
> > On Thu, Jan 2, 2025 at 5:21 AM Jiri Olsa <olsajiri@...il.com> wrote:
> > >
> > > On Thu, Dec 26, 2024 at 02:11:16PM +0900, Masami Hiramatsu (Google) wrote:
> > > > Hi,
> > > >
> > > > Here is the 22nd version of the series to re-implement the fprobe on
> > > > function-graph tracer. The previous version is;
> > > >
> > > > https://lore.kernel.org/all/173379652547.973433.2311391879173461183.stgit@devnote2/
> > > >
> > > > This version is rebased on v6.13-rc4 with fixes on [3/20] for x86-32 and
> > > > [5/20] for build error.
> > >
> > >
> > > hi,
> > > I ran the bench and I'm seeing native_sched_clock being used
> > > again kretprobe_multi bench:
> > >
> > > 5.85% bench [kernel.kallsyms] [k] native_sched_clock
> > > |
> > > ---native_sched_clock
> > > sched_clock
> > > |
> > > --5.83%--trace_clock_local
> > > ftrace_return_to_handler
> > > return_to_handler
> > > syscall
> > > bpf_prog_test_run_opts
> >
> > completely unrelated, Jiri, but we should stop using
> > bpf_prog_test_run_opts() for benchmarking. It goes through FD
> > refcounting, which is unnecessary tiny overhead, but more importantly
> > it causes cache line bouncing between multiple CPUs (when doing
> > multi-threaded benchmarks), which skews and limits results.
>
> so you mean to switch directly to attaching/hitting kernel functions
> or perhaps better have kernel module for that?
>
yes, cheap syscall (getpgid or something). Not a kernel module, that's
logistical hassle.
> jirka
>
> >
> > > trigger_producer_batch
> > > start_thread
> > > __GI___clone3
> > >
> > > I recall we tried to fix that before with [1] change, but that replaced
> > > later with [2] changes
> > >
> > > When I remove the trace_clock_local call in __ftrace_return_to_handler
> > > than the kretprobe-multi gets much faster (see last block below), so it
> > > seems worth to make it optional
> > >
> > > there's some decrease in kprobe_multi benchmark compared to base numbers,
> > > which I'm not sure yet why, but other than that it seems ok
> > >
> > > base:
> > > kprobe : 12.873 ± 0.011M/s
> > > kprobe-multi : 13.088 ± 0.052M/s
> > > kretprobe : 6.339 ± 0.003M/s
> > > kretprobe-multi: 7.240 ± 0.002M/s
> > >
> > > fprobe_on_fgraph:
> > > kprobe : 12.816 ± 0.002M/s
> > > kprobe-multi : 12.126 ± 0.004M/s
> > > kretprobe : 6.305 ± 0.018M/s
> > > kretprobe-multi: 7.740 ± 0.003M/s
> > >
> > > removed native_sched_clock call:
> > > kprobe : 12.850 ± 0.006M/s
> > > kprobe-multi : 12.115 ± 0.006M/s
> > > kretprobe : 6.270 ± 0.017M/s
> > > kretprobe-multi: 9.190 ± 0.005M/s
> > >
> > >
> > > happy new year ;-) thanks,
> > >
> > > jirka
> > >
> > >
> > > [1] https://lore.kernel.org/bpf/172615389864.133222.14452329708227900626.stgit@devnote2/
> > > [2] https://lore.kernel.org/all/20240914214805.779822616@goodmis.org/
> > >
> >
> > [...]
Powered by blists - more mailing lists