[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bzb_HPmGSoUX+9+LvSP2Yb95OqEQKtjpMiW1Um-rixAM8Q@mail.gmail.com>
Date: Fri, 23 Oct 2020 13:03:22 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Jiri Olsa <jolsa@...nel.org>
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andriin@...com>,
Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...omium.org>, Daniel Xu <dxu@...uu.xyz>,
Steven Rostedt <rostedt@...dmis.org>,
Jesper Brouer <jbrouer@...hat.com>,
Toke Høiland-Jørgensen <toke@...hat.com>,
Viktor Malik <vmalik@...hat.com>
Subject: Re: [RFC bpf-next 09/16] bpf: Add BPF_TRAMPOLINE_BATCH_ATTACH support
On Thu, Oct 22, 2020 at 8:01 AM Jiri Olsa <jolsa@...nel.org> wrote:
>
> Adding BPF_TRAMPOLINE_BATCH_ATTACH support, that allows to attach
> tracing multiple fentry/fexit pograms to trampolines within one
> syscall.
>
> Currently each tracing program is attached in seprate bpf syscall
> and more importantly by separate register_ftrace_direct call, which
> registers trampoline in ftrace subsystem. We can save some cycles
> by simple using its batch variant register_ftrace_direct_ips.
>
> Before:
>
> Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s*
> { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs):
>
> 2,199,433,771 cycles:k ( +- 0.55% )
> 936,105,469 cycles:u ( +- 0.37% )
>
> 26.48 +- 3.57 seconds time elapsed ( +- 13.49% )
>
> After:
>
> Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s*
> { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs):
>
> 1,456,854,867 cycles:k ( +- 0.57% )
> 937,737,431 cycles:u ( +- 0.13% )
>
> 12.44 +- 2.98 seconds time elapsed ( +- 23.95% )
>
> The new BPF_TRAMPOLINE_BATCH_ATTACH syscall command expects
> following data in union bpf_attr:
>
> struct {
> __aligned_u64 in;
> __aligned_u64 out;
> __u32 count;
> } trampoline_batch;
>
> in - pointer to user space array with file descrptors of loaded bpf
> programs to attach
> out - pointer to user space array for resulting link descriptor
> count - number of 'in/out' file descriptors
>
> Basically the new code gets programs from 'in' file descriptors and
> attaches them the same way the current code does, apart from the last
> step that registers probe ip with trampoline. This is done at the end
> with new register_ftrace_direct_ips function.
>
> The resulting link descriptors are written in 'out' array and match
> 'in' array file descriptors order.
>
I think this is a pretty hard API to use correctly from user-space.
Think about all those partially attached and/or partially detached BPF
programs. And subsequent clean up for them. Also there is nothing even
close to atomicity, so you might get a spurious invocation a few times
before batch-attach fails mid-way and the kernel (hopefully) will
detach those already attached programs in an attempt to clean
everything up. Debugging and handling that is a big pain for users,
IMO.
Here's a raw idea, let's think if it would be possible to implement
something like this. It seems like what you need is to create a set of
logically-grouped placeholders for multiple functions you are about to
attach to. Until the BPF program is attached, those placeholders are
just no-ops (e.g., they might jump to an "inactive" single trampoline,
which just immediately returns). Then you attach the BPF program
atomically into a single place, and all those no-op jumps to a
trampoline start to call the BPF program at the same time. It's not
strictly atomic, but is much closer in time with each other. Also,
because it's still a single trampoline, you get a nice mapping to a
single bpf_link, so detaching is not an issue.
Basically, maybe ftrace subsystem could provide a set of APIs to
prepare a set of functions to attach to. Then BPF subsystem would just
do what it does today, except instead of attaching to a specific
kernel function, it would attach to ftrace's placeholder. I don't know
anything about ftrace implementation, so this might be far off. But I
thought that looking at this problem from a bit of a different angle
would benefit the discussion. Thoughts?
> Signed-off-by: Jiri Olsa <jolsa@...nel.org>
> ---
> include/linux/bpf.h | 15 ++++++-
> include/uapi/linux/bpf.h | 7 ++++
> kernel/bpf/syscall.c | 88 ++++++++++++++++++++++++++++++++++++++--
> kernel/bpf/trampoline.c | 69 +++++++++++++++++++++++++++----
> 4 files changed, 164 insertions(+), 15 deletions(-)
>
[...]
Powered by blists - more mailing lists