[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221005113019.18aeda76@gandalf.local.home>
Date: Wed, 5 Oct 2022 11:30:19 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Florent Revest <revest@...omium.org>
Cc: Xu Kuohai <xukuohai@...wei.com>,
Mark Rutland <mark.rutland@....com>,
Catalin Marinas <catalin.marinas@....com>,
Daniel Borkmann <daniel@...earbox.net>,
Xu Kuohai <xukuohai@...weicloud.com>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
bpf@...r.kernel.org, Will Deacon <will@...nel.org>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Ingo Molnar <mingo@...hat.com>,
Oleg Nesterov <oleg@...hat.com>,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...gle.com>,
Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
Zi Shen Lim <zlim.lnx@...il.com>,
Pasha Tatashin <pasha.tatashin@...een.com>,
Ard Biesheuvel <ardb@...nel.org>,
Marc Zyngier <maz@...nel.org>, Guo Ren <guoren@...nel.org>,
Masami Hiramatsu <mhiramat@...nel.org>
Subject: Re: [PATCH bpf-next v2 0/4] Add ftrace direct call for arm64
On Wed, 5 Oct 2022 17:10:33 +0200
Florent Revest <revest@...omium.org> wrote:
> On Wed, Oct 5, 2022 at 5:07 PM Steven Rostedt <rostedt@...dmis.org> wrote:
> >
> > On Wed, 5 Oct 2022 22:54:15 +0800
> > Xu Kuohai <xukuohai@...wei.com> wrote:
> >
> > > 1.3 attach bpf prog with with direct call, bpftrace -e 'kfunc:vfs_write {}'
> > >
> > > # dd if=/dev/zero of=/dev/null count=1000000
> > > 1000000+0 records in
> > > 1000000+0 records out
> > > 512000000 bytes (512 MB, 488 MiB) copied, 1.72973 s, 296 MB/s
> > >
> > >
> > > 1.4 attach bpf prog with with indirect call, bpftrace -e 'kfunc:vfs_write {}'
> > >
> > > # dd if=/dev/zero of=/dev/null count=1000000
> > > 1000000+0 records in
> > > 1000000+0 records out
> > > 512000000 bytes (512 MB, 488 MiB) copied, 1.99179 s, 257 MB/s
>
> Thanks for the measurements Xu!
>
> > Can you show the implementation of the indirect call you used?
>
> Xu used my development branch here
> https://github.com/FlorentRevest/linux/commits/fprobe-min-args
That looks like it could be optimized quite a bit too.
Specifically this part:
static bool bpf_fprobe_entry(struct fprobe *fp, unsigned long ip, struct ftrace_regs *regs, void *private)
{
struct bpf_fprobe_call_context *call_ctx = private;
struct bpf_fprobe_context *fprobe_ctx = fp->ops.private;
struct bpf_tramp_links *links = fprobe_ctx->links;
struct bpf_tramp_links *fentry = &links[BPF_TRAMP_FENTRY];
struct bpf_tramp_links *fmod_ret = &links[BPF_TRAMP_MODIFY_RETURN];
struct bpf_tramp_links *fexit = &links[BPF_TRAMP_FEXIT];
int i, ret;
memset(&call_ctx->ctx, 0, sizeof(call_ctx->ctx));
call_ctx->ip = ip;
for (i = 0; i < fprobe_ctx->nr_args; i++)
call_ctx->args[i] = ftrace_regs_get_argument(regs, i);
for (i = 0; i < fentry->nr_links; i++)
call_bpf_prog(fentry->links[i], &call_ctx->ctx, call_ctx->args);
call_ctx->args[fprobe_ctx->nr_args] = 0;
for (i = 0; i < fmod_ret->nr_links; i++) {
ret = call_bpf_prog(fmod_ret->links[i], &call_ctx->ctx,
call_ctx->args);
if (ret) {
ftrace_regs_set_return_value(regs, ret);
ftrace_override_function_with_return(regs);
bpf_fprobe_exit(fp, ip, regs, private);
return false;
}
}
return fexit->nr_links;
}
There's a lot of low hanging fruit to speed up there. I wouldn't be too
fast to throw out this solution if it hasn't had the care that direct calls
have had to speed that up.
For example, trampolines currently only allow to attach to functions with 6
parameters or less (3 on x86_32). You could make 7 specific callbacks, with
zero to 6 parameters, and unroll the argument loop.
Would also be interesting to run perf to see where the overhead is. There
may be other locations to work on to make it almost as fast as direct
callers without the other baggage.
-- Steve
>
> As it stands, the performance impact of the fprobe based
> implementation would be too high for us. I wonder how much Mark's idea
> here https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/per-callsite-ops
> would help but it doesn't work right now.
Powered by blists - more mailing lists