[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzYtD9dGUy3hZRRAA56CaVvW7xTR9tp0dXKyVQXym046eQ@mail.gmail.com>
Date: Mon, 20 Jul 2020 20:48:04 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Toke Høiland-Jørgensen <toke@...hat.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
Andrii Nakryiko <andriin@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...omium.org>,
Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH bpf-next v2 3/6] bpf: support attaching freplace programs
to multiple attach points
On Mon, Jul 20, 2020 at 4:35 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Sun, Jul 19, 2020 at 10:02:48PM -0700, Andrii Nakryiko wrote:
> > On Thu, Jul 16, 2020 at 7:06 PM Alexei Starovoitov
> > <alexei.starovoitov@...il.com> wrote:
> > >
> > > On Thu, Jul 16, 2020 at 12:50:05PM +0200, Toke Høiland-Jørgensen wrote:
> > > > Alexei Starovoitov <alexei.starovoitov@...il.com> writes:
> > > >
> > > > > On Wed, Jul 15, 2020 at 03:09:02PM +0200, Toke Høiland-Jørgensen wrote:
> > > > >>
> > > > >> + if (tgt_prog_fd) {
> > > > >> + /* For now we only allow new targets for BPF_PROG_TYPE_EXT */
> > > > >> + if (prog->type != BPF_PROG_TYPE_EXT ||
> > > > >> + !btf_id) {
> > > > >> + err = -EINVAL;
> > > > >> + goto out_put_prog;
> > > > >> + }
> > > > >> + tgt_prog = bpf_prog_get(tgt_prog_fd);
> > > > >> + if (IS_ERR(tgt_prog)) {
> > > > >> + err = PTR_ERR(tgt_prog);
> > > > >> + tgt_prog = NULL;
> > > > >> + goto out_put_prog;
> > > > >> + }
> > > > >> +
> > > > >> + } else if (btf_id) {
> > > > >> + err = -EINVAL;
> > > > >> + goto out_put_prog;
> > > > >> + } else {
> > > > >> + btf_id = prog->aux->attach_btf_id;
> > > > >> + tgt_prog = prog->aux->linked_prog;
> > > > >> + if (tgt_prog)
> > > > >> + bpf_prog_inc(tgt_prog); /* we call bpf_prog_put() on link release */
> > > > >
> > > > > so the first prog_load cmd will beholding the first target prog?
> > > > > This is complete non starter.
> > > > > You didn't mention such decision anywhere.
> > > > > The first ext prog will attach to the first dispatcher xdp prog,
> > > > > then that ext prog will multi attach to second dispatcher xdp prog and
> > > > > the first dispatcher prog will live in the kernel forever.
> > > >
> > > > Huh, yeah, you're right that's no good. Missing that was a think-o on my
> > > > part, sorry about that :/
> > > >
> > > > > That's not what we discussed back in April.
> > > >
> > > > No, you mentioned turning aux->linked_prog into a list. However once I
> > > > started looking at it I figured it was better to actually have all this
> > > > (the trampoline and ref) as part of the bpf_link structure, since
> > > > logically they're related.
> > > >
> > > > But as you pointed out, the original reference sticks. So either that
> > > > needs to be removed, or I need to go back to the 'aux->linked_progs as a
> > > > list' idea. Any preference?
> > >
> > > Good question. Back then I was thinking about converting linked_prog into link
> > > list, since standalone single linked_prog is quite odd, because attaching ext
> > > prog to multiple tgt progs should have equivalent properties across all
> > > attachments.
> > > Back then bpf_link wasn't quite developed.
> > > Now I feel moving into bpf_tracing_link is better.
> > > I guess a link list of bpf_tracing_link-s from 'struct bpf_prog' might work.
> > > At prog load time we can do bpf_link_init() only (without doing bpf_link_prime)
> > > and keep this pre-populated bpf_link with target bpf prog and trampoline
> > > in a link list accessed from 'struct bpf_prog'.
> > > Then bpf_tracing_prog_attach() without extra tgt_prog_fd/btf_id would complete
> > > that bpf_tracing_link by calling bpf_link_prime() and bpf_link_settle()
> > > without allocating new one.
> > > Something like:
> > > struct bpf_tracing_link {
> > > struct bpf_link link; /* ext prog pointer is hidding in there */
> > > enum bpf_attach_type attach_type;
> > > struct bpf_trampoline *tr;
> > > struct bpf_prog *tgt_prog; /* old aux->linked_prog */
> > > };
> > >
> > > ext prog -> aux -> link list of above bpf_tracing_link-s
> > >
> > > It's a circular reference, obviously.
> > > Need to think through the complications and locking.
> > >
> > > bpf_tracing_prog_attach() with tgt_prog_fd/btf_id will alloc new bpf_tracing_link
> > > and will add it to a link list.
> > >
> > > Just a rough idea. I wonder what Andrii thinks.
> > >
> >
> > I need to spend more time reading existing and new code to see all the
> > details, but I'll throw a slightly different proposal and let you guys
> > shoot it down.
> >
> > So, what if instead of having linked_prog (as bpf_prog *, refcnt'ed),
> > at BPF_PROG_LOAD time we just record the target prog's ID. BPF
> > verifier, when doing its target prog checks would attempt to get
> > bpf_prog * reference; if by that time the target program is gone,
> > fail, of course. If not, everything proceeds as is, at the end of
> > verification target_prog is put until attach time.
> >
> > Then at attach time, we either go with pre-recorded (in
> > prog->aux->linked_prog_id) target prog's ID or we get a new one from
> > RAW_TP_OPEN tgt_prog_fd. Either way, we bump refcnt on that target
> > prog and keep it with bpf_tracing_link (so link on detach would put
> > target_prog, that way it doesn't go away while EXT prog is attached).
> > Then do all the compatibility checks, and if everything works out,
> > bpf_tracing_link gets created, we record trampoline there, etc, etc.
> > Basically, instead of having an EXT prog holding a reference to the
> > target prog, only attachment (bpf_link) does that, which conceptually
> > also seems to make more sense to me. For verification we store prog ID
> > and don't hold target prog at all.
> >
> >
> > Now, there will be a problem once you attach EXT prog to a new XDP
> > root program and release a link against the original XDP root program.
> > First, I hope I understand the desired sequence right, here's an
> > example:
> >
> > 1. load XDP root prog X
> > 2. load EXT prog with target prog X
> > 3. attach EXT prog to prog X
> > 4. load XDP root prog Y
> > 5. attach EXT prog to prog Y (Y and X should be "compatible")
> > 6. detach prog X (close bpf_link)
> >
> > Is that the right sequence?
> >
> > If yes, then the problem with storing ID of prog X in EXT
> > prog->aux->linked_prog_id is that you won't be able to re-attach to
> > new prog Z, because there won't be anything to check compatibility
> > against (prog X will be long time gone).
> >
> > So we can do two things here:
> >
> > 1. on attach, replace ext_prog->aux->linked_prog_id with the latest
> > attached prog (prog Y ID from above example)
> > 2. instead of recording target program FD/ID, capture BTF FD and/or
> > enough BTF information for checking compatibility.
> >
> > Approach 2) seems like conceptually the right thing to do (record type
> > info we care about, not an **instance** of BPF program, compatible
> > with that type info), but technically might be harder.
>
> I've read your proposal couple times and still don't get what you're
> trying to solve with either ID or BTF info recording.
> So that target prog doesn't get refcnt-ed? What's a problem with it?
> Currently it's being refcnt-d in aux->linked_prog.
> What I'm proposing about is to convert aux->linked_prog into a link list
> of bpf_tracing_links which will contain linked_prog inside.
> Conceptually that's what bpf_link is doing. It links two progs.
> EXT prog is recorded in 'struct bpf_link' and
> the target prog is recorded in 'struct bpf_tracing_link'.
> So from bpf_link perspective everything seems clean to me.
> The link list of bpf_tracing_link-s in EXT_prog->aux is only to preserve
> existing api of prog_load cmd.
Right, I wanted to avoid taking a refcnt on aux->linked_prog during
PROG_LOAD. The reason for that was (and still is) that I don't get who
and when has to bpf_prog_put() original aux->linked_prog to allow the
prog X to be freed. I.e., after you re-attach to prog Y, how prog X is
released (assuming no active bpf_link is keeping it from being freed)?
That's my biggest confusion right now.
I also didn't like the idea of half-creating bpf_tracing_link on
PROG_LOAD and then turning it into a real link with bpf_link_settle on
attach. That sounded like a hack to me.
But now I'm also confused why we need to turn aux->linked_prog into a
list. Seems like we need it only for old-style attach that doesn't
specify tgt_prog_fd, no? Only in that case we'll use aux->linked_prog.
Otherwise we know the target prog from tgt_prog_fd. So I'll be honest
that I don't get the whole idea of maintaining a list of
bpf_tracing_links. It seems like it should be possible to make
bpf_tracing_link decoupled from any prog's aux and have their own
independent lifetime.
>
> As far as step 5: attach EXT prog to prog Y (Y and X should be "compatible")
> The chance of failure there should be minimal. libxdp/libdispatcher will
> prepare rootlet XDP prog. It should really make sure that Y and X are compatible.
> This should be invisible to users.
Right, of course, but the kernel needs to validate that anyways, which
is why I pointed that out. Or are you saying we should just assume
that they are valid?
>
> In addition we still need bpf_link_update_hook() I was talking about in April.
> The full sequence is:
> first user process:
> 1. load XDP root prog X
> 1' root_link = attach X to eth0
> 2. load EXT prog with target prog X
> 3. app1_link_fd = attach EXT prog to prog X
> second user process:
> 4. load XDP root prog Y
> 4'. find EXT prog of the first user process
> 5. app2_link_fd = attach EXT prog to prog Y (Y and X should be "compatible")
> 6. bpf_link_update(root_link, X, Y); // now packet flows into Y and into EXT
> // while EXT is attached in two places
> 7. app1_link_fd' = FD in second process that points to the same tracing link
> as app1_link_fd in the first process.
> bpf_link_update_hook(app1_link_fd', app2_link_fd)
> the last operation need to update bpf_tracing_link that is held by app1
> (which is the first user process) from the second user process. It needs to
> retarget (update_hook) inside bpf_tracing_link from X to Y.
> Since the processes are more or less not aware of each other.
> One firewall holds link_fd that connects EXT to X,
> but the second firewall (via libxdp) is updaing that tracing link
> to re-hook EXT into Y.
Yeah, should be doable given that bpf_trampoline is independently refcounted.
Powered by blists - more mailing lists