[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bza0+MHuRneepidvXFZGZJ+hnMtaJpCq7EU=pvZHW7FD9w@mail.gmail.com>
Date: Mon, 26 Oct 2020 16:19:30 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Jiri Olsa <jolsa@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>, Jiri Olsa <jolsa@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andriin@...com>,
Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...omium.org>, Daniel Xu <dxu@...uu.xyz>,
Jesper Brouer <jbrouer@...hat.com>,
Toke Høiland-Jørgensen <toke@...hat.com>,
Viktor Malik <vmalik@...hat.com>
Subject: Re: [RFC bpf-next 09/16] bpf: Add BPF_TRAMPOLINE_BATCH_ATTACH support
On Sun, Oct 25, 2020 at 12:41 PM Jiri Olsa <jolsa@...hat.com> wrote:
>
> On Fri, Oct 23, 2020 at 03:23:10PM -0700, Andrii Nakryiko wrote:
> > On Fri, Oct 23, 2020 at 1:31 PM Steven Rostedt <rostedt@...dmis.org> wrote:
> > >
> > > On Fri, 23 Oct 2020 13:03:22 -0700
> > > Andrii Nakryiko <andrii.nakryiko@...il.com> wrote:
> > >
> > > > Basically, maybe ftrace subsystem could provide a set of APIs to
> > > > prepare a set of functions to attach to. Then BPF subsystem would just
> > > > do what it does today, except instead of attaching to a specific
> > > > kernel function, it would attach to ftrace's placeholder. I don't know
> > > > anything about ftrace implementation, so this might be far off. But I
> > > > thought that looking at this problem from a bit of a different angle
> > > > would benefit the discussion. Thoughts?
> > >
> > > I probably understand bpf internals as much as you understand ftrace
> > > internals ;-)
> > >
> >
> > Heh :) But while we are here, what do you think about this idea of
> > preparing a no-op trampoline, that a bunch (thousands, potentially) of
> > function entries will jump to. And once all that is ready and patched
> > through kernel functions entry points, then allow to attach BPF
> > program or ftrace callback (if I get the terminology right) in a one
> > fast and simple operation? For users that would mean that they will
> > either get calls for all or none of attached kfuncs, with a simple and
> > reliable semantics.
>
> so the main pain point the batch interface is addressing, is that
> every attach (BPF_RAW_TRACEPOINT_OPEN command) calls register_ftrace_direct,
> and you'll need to do the same for nop trampoline, no?
I guess I had a hope that if we know it's a nop that we are
installing, then we can do it without extra waiting, which should
speed it up quite a bit.
>
> I wonder if we could create some 'transaction object' represented
> by fd and add it to bpf_attr::raw_tracepoint
>
> then attach (BPF_RAW_TRACEPOINT_OPEN command) would add program to this
> new 'transaction object' instead of updating ftrace directly
>
> and when the collection is done (all BPF_RAW_TRACEPOINT_OPEN command
> are executed), we'd call new bpf syscall command on that transaction
> and it would call ftrace interface
>
This is conceptually something like what I had in mind, but I had a
single BPF program attached to many kernel functions in mind.
Something that's impossible today, as you mentioned in another thread.
> something like:
>
> bpf(TRANSACTION_NEW) = fd
> bpf(BPF_RAW_TRACEPOINT_OPEN) for prog_fd_1, fd
> bpf(BPF_RAW_TRACEPOINT_OPEN) for prog_fd_2, fd
> ...
> bpf(TRANSACTION_DONE) for fd
>
> jirka
>
> >
> > Something like this, where bpf_prog attachment (which replaces nop)
> > happens as step 2:
> >
> > +------------+ +----------+ +----------+
> > | kfunc1 | | kfunc2 | | kfunc3 |
> > +------+-----+ +----+-----+ +----+-----+
> > | | |
> > | | |
> > +---------------------------+
> > |
> > v
> > +---+---+ +-----------+
> > | nop +-----------> bpf_prog |
> > +-------+ +-----------+
> >
> >
> > > Anyway, what I'm currently working on, is a fast way to get to the
> > > arguments of a function. For now, I'm just focused on x86_64, and only add
> > > 6 argments.
> > >
> > > The main issue that Alexei had with using the ftrace trampoline, was that
> > > the only way to get to the arguments was to set the "REGS" flag, which
> > > would give a regs parameter that contained a full pt_regs. The problem with
> > > this approach is that it required saving *all* regs for every function
> > > traced. Alexei felt that this was too much overehead.
> > >
> > > Looking at Jiri's patch, I took a look at the creation of the bpf
> > > trampoline, and noticed that it's copying the regs on a stack (at least
> > > what is used, which I think could be an issue).
> >
> > Right. And BPF doesn't get access to the entire pt_regs struct, so it
> > doesn't have to pay the prices of saving it.
> >
> > But just FYI. Alexei is out till next week, so don't expect him to
> > reply in the next few days. But he's probably best to discuss these
> > nitty-gritty details with :)
> >
> > >
> > > For tracing a function, one must store all argument registers used, and
> > > restore them, as that's how they are passed from caller to callee. And
> > > since they are stored anyway, I figure, that should also be sent to the
> > > function callbacks, so that they have access to them too.
> > >
> > > I'm working on a set of patches to make this a reality.
> > >
> > > -- Steve
> >
>
Powered by blists - more mailing lists