[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y0CsATkd2qK1Mu2Z@google.com>
Date: Fri, 7 Oct 2022 15:45:21 -0700
From: sdf@...gle.com
To: Daniel Borkmann <daniel@...earbox.net>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>,
"Toke Høiland-Jørgensen" <toke@...hat.com>,
bpf <bpf@...r.kernel.org>,
Nikolay Aleksandrov <razor@...ckwall.org>,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
John Fastabend <john.fastabend@...il.com>,
Joanne Koong <joannelkoong@...il.com>,
Kumar Kartikeya Dwivedi <memxor@...il.com>,
Joe Stringer <joe@...ium.io>,
Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach tc
BPF programs
On 10/07, Daniel Borkmann wrote:
> On 10/7/22 8:59 PM, Alexei Starovoitov wrote:
> > On Fri, Oct 7, 2022 at 10:20 AM Toke H�iland-J�rgensen
> <toke@...hat.com> wrote:
> [...]
> > > > > I was thinking a little about how this might work; i.e., how can
> the
> > > > > kernel expose the required knobs to allow a system policy to be
> > > > > implemented without program loading having to talk to anything
> other
> > > > > than the syscall API?
> > > >
> > > > > How about we only expose prepend/append in the prog attach UAPI,
> and
> > > > > then have a kernel function that does the sorting like:
> > > >
> > > > > int bpf_add_new_tcx_prog(struct bpf_prog *progs, size_t
> num_progs, struct
> > > > > bpf_prog *new_prog, bool append)
> > > >
> > > > > where the default implementation just appends/prepends to the
> array in
> > > > > progs depending on the value of 'appen'.
> > > >
> > > > > And then use the __weak linking trick (or maybe struct_ops with a
> member
> > > > > for TXC, another for XDP, etc?) to allow BPF to override the
> function
> > > > > wholesale and implement whatever ordering it wants? I.e., allow
> it can
> > > > > to just shift around the order of progs in the 'progs' array
> whenever a
> > > > > program is loaded/unloaded?
> > > >
> > > > > This way, a userspace daemon can implement any policy it wants by
> just
> > > > > attaching to that hook, and keeping things like how to express
> > > > > dependencies as a userspace concern?
> > > >
> > > > What if we do the above, but instead of simple global 'attach
> first/last',
> > > > the default api would be:
> > > >
> > > > - attach before <target_fd>
> > > > - attach after <target_fd>
> > > > - attach before target_fd=-1 == first
> > > > - attach after target_fd=-1 == last
> > > >
> > > > ?
> > >
> > > Hmm, the problem with that is that applications don't generally have
> an
> > > fd to another application's BPF programs; and obtaining them from an
> ID
> > > is a privileged operation (CAP_SYS_ADMIN). We could have it be "attach
> > > before target *ID*" instead, which could work I guess? But then the
> > > problem becomes that it's racy: the ID you're targeting could get
> > > detached before you attach, so you'll need to be prepared to check
> that
> > > and retry; and I'm almost certain that applications won't test for
> this,
> > > so it'll just lead to hard-to-debug heisenbugs. Or am I being too
> > > pessimistic here?
> >
> > I like Stan's proposal and don't see any issue with FD.
> > It's good to gate specific sequencing with cap_sys_admin.
> > Also for consistency the FD is better than ID.
> >
> > I also like systemd analogy with Before=, After=.
> > systemd has a ton more ways to specify deps between Units,
> > but none of them have absolute numbers (which is what priority is).
> > The only bit I'd tweak in Stan's proposal is:
> > - attach before <target_fd>
> > - attach after <target_fd>
> > - attach before target_fd=0 == first
> > - attach after target_fd=0 == last
> I think the before(), after() could work, but the target_fd I have my
> doubts
> that it will be practical. Maybe lets walk through a concrete real
> example. app_a
> and app_b shipped via container_a resp container_b. Both want to install
> tc BPF
> and we (operator/user) want to say that prog from app_b should only be
> inserted
> after the one from app_a, never run before; if no prog_a is installed, we
> ofc just
> run prog_b, but if prog_a is inserted, it must be before prog_b given the
> latter
> can only run after the former. How would we get to one anothers target
> fd? One
> could use the 0, but not if more programs sit before/after.
This fd/id has to be definitely abstracted by the loader. With the
program, we would ship some metadata like 'run_after:prog_a' for
prog_b (where prog_a might be literal function name maybe?).
However, this also depends on 'run_before:prog_b' in prog_a (in
case it happens to be started after prog_b) :-/
So yeah, some central place might still be needed; in this case, Toke's
suggestion on overriding this via bpf seems like the most flexible one.
Or maybe libbpf can consult some /etc/bpf.init.d/ directory for those?
Not sure if it's too much for libbpf or it's better done by the higher
levels? I guess we can rely on the program names and then all we really
need is some place to say 'prog X happens before Y' and for the loaders
to interpret that.
> To me it sounds reasonable to have the append mode as default mode/API,
> and an advanced option to say 'I want to run as 2nd prog, but if something
> is already attached as 2nd prog, shift all the others +1 in the array'
> which
> would relate to your above point, Stan, of being able to stick into any
> place in the chain.
Replying to your other email here:
I'd still prefer, from the user side, to be able to stick my prog into
any place for debugging. But you suggestion to shift others for +1 works
for me.
(although, not sure, for example, what happens if I want to shift right the
program that's at position 65k; aka already last?)
IMO, having explicit before/after+target is slightly better usability-wise
than juggling priorities, but I'm fine with either way.
Powered by blists - more mailing lists