[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fb3973b6-c65e-fb98-7cdf-46c8a4cf0c4d@huawei.com>
Date: Thu, 6 Oct 2022 18:09:44 +0800
From: Xu Kuohai <xukuohai@...wei.com>
To: Steven Rostedt <rostedt@...dmis.org>,
Florent Revest <revest@...omium.org>
CC: Mark Rutland <mark.rutland@....com>,
Catalin Marinas <catalin.marinas@....com>,
Daniel Borkmann <daniel@...earbox.net>,
<linux-arm-kernel@...ts.infradead.org>,
<linux-kernel@...r.kernel.org>, <bpf@...r.kernel.org>,
Will Deacon <will@...nel.org>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Ingo Molnar <mingo@...hat.com>,
Oleg Nesterov <oleg@...hat.com>,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...gle.com>,
Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
Zi Shen Lim <zlim.lnx@...il.com>,
Pasha Tatashin <pasha.tatashin@...een.com>,
Ard Biesheuvel <ardb@...nel.org>,
Marc Zyngier <maz@...nel.org>, Guo Ren <guoren@...nel.org>,
Masami Hiramatsu <mhiramat@...nel.org>
Subject: Re: [PATCH bpf-next v2 0/4] Add ftrace direct call for arm64
On 10/5/2022 11:30 PM, Steven Rostedt wrote:
> On Wed, 5 Oct 2022 17:10:33 +0200
> Florent Revest <revest@...omium.org> wrote:
>
>> On Wed, Oct 5, 2022 at 5:07 PM Steven Rostedt <rostedt@...dmis.org> wrote:
>>>
>>> On Wed, 5 Oct 2022 22:54:15 +0800
>>> Xu Kuohai <xukuohai@...wei.com> wrote:
>>>
>>>> 1.3 attach bpf prog with with direct call, bpftrace -e 'kfunc:vfs_write {}'
>>>>
>>>> # dd if=/dev/zero of=/dev/null count=1000000
>>>> 1000000+0 records in
>>>> 1000000+0 records out
>>>> 512000000 bytes (512 MB, 488 MiB) copied, 1.72973 s, 296 MB/s
>>>>
>>>>
>>>> 1.4 attach bpf prog with with indirect call, bpftrace -e 'kfunc:vfs_write {}'
>>>>
>>>> # dd if=/dev/zero of=/dev/null count=1000000
>>>> 1000000+0 records in
>>>> 1000000+0 records out
>>>> 512000000 bytes (512 MB, 488 MiB) copied, 1.99179 s, 257 MB/s
>>
>> Thanks for the measurements Xu!
>>
>>> Can you show the implementation of the indirect call you used?
>>
>> Xu used my development branch here
>> https://github.com/FlorentRevest/linux/commits/fprobe-min-args
>
> That looks like it could be optimized quite a bit too.
>
> Specifically this part:
>
> static bool bpf_fprobe_entry(struct fprobe *fp, unsigned long ip, struct ftrace_regs *regs, void *private)
> {
> struct bpf_fprobe_call_context *call_ctx = private;
> struct bpf_fprobe_context *fprobe_ctx = fp->ops.private;
> struct bpf_tramp_links *links = fprobe_ctx->links;
> struct bpf_tramp_links *fentry = &links[BPF_TRAMP_FENTRY];
> struct bpf_tramp_links *fmod_ret = &links[BPF_TRAMP_MODIFY_RETURN];
> struct bpf_tramp_links *fexit = &links[BPF_TRAMP_FEXIT];
> int i, ret;
>
> memset(&call_ctx->ctx, 0, sizeof(call_ctx->ctx));
> call_ctx->ip = ip;
> for (i = 0; i < fprobe_ctx->nr_args; i++)
> call_ctx->args[i] = ftrace_regs_get_argument(regs, i);
>
> for (i = 0; i < fentry->nr_links; i++)
> call_bpf_prog(fentry->links[i], &call_ctx->ctx, call_ctx->args);
>
> call_ctx->args[fprobe_ctx->nr_args] = 0;
> for (i = 0; i < fmod_ret->nr_links; i++) {
> ret = call_bpf_prog(fmod_ret->links[i], &call_ctx->ctx,
> call_ctx->args);
>
> if (ret) {
> ftrace_regs_set_return_value(regs, ret);
> ftrace_override_function_with_return(regs);
>
> bpf_fprobe_exit(fp, ip, regs, private);
> return false;
> }
> }
>
> return fexit->nr_links;
> }
>
> There's a lot of low hanging fruit to speed up there. I wouldn't be too
> fast to throw out this solution if it hasn't had the care that direct calls
> have had to speed that up.
>
> For example, trampolines currently only allow to attach to functions with 6
> parameters or less (3 on x86_32). You could make 7 specific callbacks, with
> zero to 6 parameters, and unroll the argument loop.
>
> Would also be interesting to run perf to see where the overhead is. There
> may be other locations to work on to make it almost as fast as direct
> callers without the other baggage.
>
There is something wrong with my pi4 perf, I'll send the perf report after
I fix it.
> -- Steve
>
>>
>> As it stands, the performance impact of the fprobe based
>> implementation would be too high for us. I wonder how much Mark's idea
>> here https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/per-callsite-ops
>> would help but it doesn't work right now.
>
>
> .
Powered by blists - more mailing lists