[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220526000332.soaacn3n7bic3fq5@kafai-mbp>
Date: Wed, 25 May 2022 17:03:32 -0700
From: Martin KaFai Lau <kafai@...com>
To: sdf@...gle.com
Cc: Andrii Nakryiko <andrii.nakryiko@...il.com>,
Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>
Subject: Re: [PATCH bpf-next v7 05/11] bpf: implement BPF_PROG_QUERY for
BPF_LSM_CGROUP
On Wed, May 25, 2022 at 02:25:54PM -0700, sdf@...gle.com wrote:
> On 05/25, Martin KaFai Lau wrote:
> > On Wed, May 25, 2022 at 10:02:07AM -0700, Stanislav Fomichev wrote:
> > > On Wed, May 25, 2022 at 9:01 AM Stanislav Fomichev <sdf@...gle.com>
> > wrote:
> > > >
> > > > On Tue, May 24, 2022 at 9:39 PM Andrii Nakryiko
> > > > <andrii.nakryiko@...il.com> wrote:
> > > > >
> > > > > On Tue, May 24, 2022 at 9:03 PM Stanislav Fomichev
> > <sdf@...gle.com> wrote:
> > > > > >
> > > > > > On Tue, May 24, 2022 at 4:45 PM Andrii Nakryiko
> > > > > > <andrii.nakryiko@...il.com> wrote:
> > > > > > >
> > > > > > > On Tue, May 24, 2022 at 10:50 AM Martin KaFai Lau
> > <kafai@...com> wrote:
> > > > > > > >
> > > > > > > > On Tue, May 24, 2022 at 08:55:04AM -0700, Stanislav Fomichev
> > wrote:
> > > > > > > > > On Mon, May 23, 2022 at 8:49 PM Martin KaFai Lau
> > <kafai@...com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, May 18, 2022 at 03:55:25PM -0700, Stanislav
> > Fomichev wrote:
> > > > > > > > > > > We have two options:
> > > > > > > > > > > 1. Treat all BPF_LSM_CGROUP the same, regardless of
> > attach_btf_id
> > > > > > > > > > > 2. Treat BPF_LSM_CGROUP+attach_btf_id as a separate
> > hook point
> > > > > > > > > > >
> > > > > > > > > > > I was doing (2) in the original patch, but switching
> > to (1) here:
> > > > > > > > > > >
> > > > > > > > > > > * bpf_prog_query returns all attached BPF_LSM_CGROUP
> > programs
> > > > > > > > > > > regardless of attach_btf_id
> > > > > > > > > > > * attach_btf_id is exported via bpf_prog_info
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Stanislav Fomichev <sdf@...gle.com>
> > > > > > > > > > > ---
> > > > > > > > > > > include/uapi/linux/bpf.h | 5 ++
> > > > > > > > > > > kernel/bpf/cgroup.c | 103
> > +++++++++++++++++++++++++++------------
> > > > > > > > > > > kernel/bpf/syscall.c | 4 +-
> > > > > > > > > > > 3 files changed, 81 insertions(+), 31 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/include/uapi/linux/bpf.h
> > b/include/uapi/linux/bpf.h
> > > > > > > > > > > index b9d2d6de63a7..432fc5f49567 100644
> > > > > > > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > > > > > > @@ -1432,6 +1432,7 @@ union bpf_attr {
> > > > > > > > > > > __u32 attach_flags;
> > > > > > > > > > > __aligned_u64 prog_ids;
> > > > > > > > > > > __u32 prog_cnt;
> > > > > > > > > > > + __aligned_u64 prog_attach_flags; /*
> > output: per-program attach_flags */
> > > > > > > > > > > } query;
> > > > > > > > > > >
> > > > > > > > > > > struct { /* anonymous struct used by
> > BPF_RAW_TRACEPOINT_OPEN command */
> > > > > > > > > > > @@ -5911,6 +5912,10 @@ struct bpf_prog_info {
> > > > > > > > > > > __u64 run_cnt;
> > > > > > > > > > > __u64 recursion_misses;
> > > > > > > > > > > __u32 verified_insns;
> > > > > > > > > > > + /* BTF ID of the function to attach to within
> > BTF object identified
> > > > > > > > > > > + * by btf_id.
> > > > > > > > > > > + */
> > > > > > > > > > > + __u32 attach_btf_func_id;
> > > > > > > > > > > } __attribute__((aligned(8)));
> > > > > > > > > > >
> > > > > > > > > > > struct bpf_map_info {
> > > > > > > > > > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > > > > > > > > > > index a959cdd22870..08a1015ee09e 100644
> > > > > > > > > > > --- a/kernel/bpf/cgroup.c
> > > > > > > > > > > +++ b/kernel/bpf/cgroup.c
> > > > > > > > > > > @@ -1074,6 +1074,7 @@ static int
> > cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> > > > > > > > > > > static int __cgroup_bpf_query(struct cgroup *cgrp,
> > const union bpf_attr *attr,
> > > > > > > > > > > union bpf_attr __user
> > *uattr)
> > > > > > > > > > > {
> > > > > > > > > > > + __u32 __user *prog_attach_flags =
> > u64_to_user_ptr(attr->query.prog_attach_flags);
> > > > > > > > > > > __u32 __user *prog_ids =
> > u64_to_user_ptr(attr->query.prog_ids);
> > > > > > > > > > > enum bpf_attach_type type =
> > attr->query.attach_type;
> > > > > > > > > > > enum cgroup_bpf_attach_type atype;
> > > > > > > > > > > @@ -1081,50 +1082,92 @@ static int
> > __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
> > > > > > > > > > > struct hlist_head *progs;
> > > > > > > > > > > struct bpf_prog *prog;
> > > > > > > > > > > int cnt, ret = 0, i;
> > > > > > > > > > > + int total_cnt = 0;
> > > > > > > > > > > u32 flags;
> > > > > > > > > > >
> > > > > > > > > > > - atype = to_cgroup_bpf_attach_type(type);
> > > > > > > > > > > - if (atype < 0)
> > > > > > > > > > > - return -EINVAL;
> > > > > > > > > > > + enum cgroup_bpf_attach_type from_atype, to_atype;
> > > > > > > > > > >
> > > > > > > > > > > - progs = &cgrp->bpf.progs[atype];
> > > > > > > > > > > - flags = cgrp->bpf.flags[atype];
> > > > > > > > > > > + if (type == BPF_LSM_CGROUP) {
> > > > > > > > > > > + from_atype = CGROUP_LSM_START;
> > > > > > > > > > > + to_atype = CGROUP_LSM_END;
> > > > > > > > > > > + } else {
> > > > > > > > > > > + from_atype =
> > to_cgroup_bpf_attach_type(type);
> > > > > > > > > > > + if (from_atype < 0)
> > > > > > > > > > > + return -EINVAL;
> > > > > > > > > > > + to_atype = from_atype;
> > > > > > > > > > > + }
> > > > > > > > > > >
> > > > > > > > > > > - effective =
> > rcu_dereference_protected(cgrp->bpf.effective[atype],
> > > > > > > > > > > -
> > lockdep_is_held(&cgroup_mutex));
> > > > > > > > > > > + for (atype = from_atype; atype <= to_atype;
> > atype++) {
> > > > > > > > > > > + progs = &cgrp->bpf.progs[atype];
> > > > > > > > > > > + flags = cgrp->bpf.flags[atype];
> > > > > > > > > > >
> > > > > > > > > > > - if (attr->query.query_flags &
> > BPF_F_QUERY_EFFECTIVE)
> > > > > > > > > > > - cnt = bpf_prog_array_length(effective);
> > > > > > > > > > > - else
> > > > > > > > > > > - cnt = prog_list_length(progs);
> > > > > > > > > > > + effective =
> > rcu_dereference_protected(cgrp->bpf.effective[atype],
> > > > > > > > > > > +
> > lockdep_is_held(&cgroup_mutex));
> > > > > > > > > > >
> > > > > > > > > > > - if (copy_to_user(&uattr->query.attach_flags,
> > &flags, sizeof(flags)))
> > > > > > > > > > > - return -EFAULT;
> > > > > > > > > > > - if (copy_to_user(&uattr->query.prog_cnt, &cnt,
> > sizeof(cnt)))
> > > > > > > > > > > + if (attr->query.query_flags &
> > BPF_F_QUERY_EFFECTIVE)
> > > > > > > > > > > + total_cnt +=
> > bpf_prog_array_length(effective);
> > > > > > > > > > > + else
> > > > > > > > > > > + total_cnt +=
> > prog_list_length(progs);
> > > > > > > > > > > + }
> > > > > > > > > > > +
> > > > > > > > > > > + if (type != BPF_LSM_CGROUP)
> > > > > > > > > > > + if
> > (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
> > > > > > > > > > > + return -EFAULT;
> > > > > > > > > > > + if (copy_to_user(&uattr->query.prog_cnt,
> > &total_cnt, sizeof(total_cnt)))
> > > > > > > > > > > return -EFAULT;
> > > > > > > > > > > - if (attr->query.prog_cnt == 0 || !prog_ids ||
> > !cnt)
> > > > > > > > > > > + if (attr->query.prog_cnt == 0 || !prog_ids ||
> > !total_cnt)
> > > > > > > > > > > /* return early if user requested only
> > program count + flags */
> > > > > > > > > > > return 0;
> > > > > > > > > > > - if (attr->query.prog_cnt < cnt) {
> > > > > > > > > > > - cnt = attr->query.prog_cnt;
> > > > > > > > > > > +
> > > > > > > > > > > + if (attr->query.prog_cnt < total_cnt) {
> > > > > > > > > > > + total_cnt = attr->query.prog_cnt;
> > > > > > > > > > > ret = -ENOSPC;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > - if (attr->query.query_flags &
> > BPF_F_QUERY_EFFECTIVE) {
> > > > > > > > > > > - return
> > bpf_prog_array_copy_to_user(effective, prog_ids, cnt);
> > > > > > > > > > > - } else {
> > > > > > > > > > > - struct bpf_prog_list *pl;
> > > > > > > > > > > - u32 id;
> > > > > > > > > > > + for (atype = from_atype; atype <= to_atype;
> > atype++) {
> > > > > > > > > > > + if (total_cnt <= 0)
> > > > > > > > > > > + break;
> > > > > > > > > > >
> > > > > > > > > > > - i = 0;
> > > > > > > > > > > - hlist_for_each_entry(pl, progs, node) {
> > > > > > > > > > > - prog = prog_list_prog(pl);
> > > > > > > > > > > - id = prog->aux->id;
> > > > > > > > > > > - if (copy_to_user(prog_ids + i,
> > &id, sizeof(id)))
> > > > > > > > > > > - return -EFAULT;
> > > > > > > > > > > - if (++i == cnt)
> > > > > > > > > > > - break;
> > > > > > > > > > > + progs = &cgrp->bpf.progs[atype];
> > > > > > > > > > > + flags = cgrp->bpf.flags[atype];
> > > > > > > > > > > +
> > > > > > > > > > > + effective =
> > rcu_dereference_protected(cgrp->bpf.effective[atype],
> > > > > > > > > > > +
> > lockdep_is_held(&cgroup_mutex));
> > > > > > > > > > > +
> > > > > > > > > > > + if (attr->query.query_flags &
> > BPF_F_QUERY_EFFECTIVE)
> > > > > > > > > > > + cnt =
> > bpf_prog_array_length(effective);
> > > > > > > > > > > + else
> > > > > > > > > > > + cnt = prog_list_length(progs);
> > > > > > > > > > > +
> > > > > > > > > > > + if (cnt >= total_cnt)
> > > > > > > > > > > + cnt = total_cnt;
> > > > > > > > > > > +
> > > > > > > > > > > + if (attr->query.query_flags &
> > BPF_F_QUERY_EFFECTIVE) {
> > > > > > > > > > > + ret =
> > bpf_prog_array_copy_to_user(effective, prog_ids, cnt);
> > > > > > > > > > > + } else {
> > > > > > > > > > > + struct bpf_prog_list *pl;
> > > > > > > > > > > + u32 id;
> > > > > > > > > > > +
> > > > > > > > > > > + i = 0;
> > > > > > > > > > > + hlist_for_each_entry(pl, progs,
> > node) {
> > > > > > > > > > > + prog = prog_list_prog(pl);
> > > > > > > > > > > + id = prog->aux->id;
> > > > > > > > > > > + if
> > (copy_to_user(prog_ids + i, &id, sizeof(id)))
> > > > > > > > > > > + return -EFAULT;
> > > > > > > > > > > + if (++i == cnt)
> > > > > > > > > > > + break;
> > > > > > > > > > > + }
> > > > > > > > > > > }
> > > > > > > > > > > +
> > > > > > > > > > > + if (prog_attach_flags)
> > > > > > > > > > > + for (i = 0; i < cnt; i++)
> > > > > > > > > > > + if
> > (copy_to_user(prog_attach_flags + i, &flags, sizeof(flags)))
> > > > > > > > > > > + return -EFAULT;
> > > > > > > > > > > +
> > > > > > > > > > > + prog_ids += cnt;
> > > > > > > > > > > + total_cnt -= cnt;
> > > > > > > > > > > + if (prog_attach_flags)
> > > > > > > > > > > + prog_attach_flags += cnt;
> > > > > > > > > > > }
> > > > > > > > > > > return ret;
> > > > > > > > > > > }
> > > > > > > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > > > > > > > > index 5ed2093e51cc..4137583c04a2 100644
> > > > > > > > > > > --- a/kernel/bpf/syscall.c
> > > > > > > > > > > +++ b/kernel/bpf/syscall.c
> > > > > > > > > > > @@ -3520,7 +3520,7 @@ static int bpf_prog_detach(const
> > union bpf_attr *attr)
> > > > > > > > > > > }
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > -#define BPF_PROG_QUERY_LAST_FIELD query.prog_cnt
> > > > > > > > > > > +#define BPF_PROG_QUERY_LAST_FIELD
> > query.prog_attach_flags
> > > > > > > > > > >
> > > > > > > > > > > static int bpf_prog_query(const union bpf_attr *attr,
> > > > > > > > > > > union bpf_attr __user *uattr)
> > > > > > > > > > > @@ -3556,6 +3556,7 @@ static int bpf_prog_query(const
> > union bpf_attr *attr,
> > > > > > > > > > > case BPF_CGROUP_SYSCTL:
> > > > > > > > > > > case BPF_CGROUP_GETSOCKOPT:
> > > > > > > > > > > case BPF_CGROUP_SETSOCKOPT:
> > > > > > > > > > > + case BPF_LSM_CGROUP:
> > > > > > > > > > > return cgroup_bpf_prog_query(attr, uattr);
> > > > > > > > > > > case BPF_LIRC_MODE2:
> > > > > > > > > > > return lirc_prog_query(attr, uattr);
> > > > > > > > > > > @@ -4066,6 +4067,7 @@ static int
> > bpf_prog_get_info_by_fd(struct file *file,
> > > > > > > > > > >
> > > > > > > > > > > if (prog->aux->btf)
> > > > > > > > > > > info.btf_id = btf_obj_id(prog->aux->btf);
> > > > > > > > > > > + info.attach_btf_func_id =
> > prog->aux->attach_btf_id;
> > > > > > > > > > Note that exposing prog->aux->attach_btf_id only may not
> > be enough
> > > > > > > > > > unless it can assume info.attach_btf_id is always
> > referring to btf_vmlinux
> > > > > > > > > > for all bpf prog types.
> > > > > > > > >
> > > > > > > > > We also export btf_id two lines above, right? Btw, I left
> > a comment in
> > > > > > > > > the bpftool about those btf_ids, I'm not sure how resolve
> > them and
> > > > > > > > > always assume vmlinux for now.
> > > > > > > > yeah, that btf_id above is the cgroup-lsm prog's btf_id
> > which has its
> > > > > > > > func info, line info...etc. It is not the one the
> > attach_btf_id correspond
> > > > > > > > to. attach_btf_id refers to either aux->attach_btf or
> > aux->dst_prog's btf (or
> > > > > > > > target btf id here).
> > > > > > > >
> > > > > > > > It needs a consensus on where this attach_btf_id, target btf
> > id, and
> > > > > > > > prog_attach_flags should be. If I read the patch 7 thread
> > correctly,
> > > > > > > > I think Andrii is suggesting to expose them to userspace
> > through link, so
> > > > > > > > potentially putting them in bpf_link_info. The
> > bpf_prog_query will
> > > > > > > > output a list of link ids. The same probably applies to
> > > > > > >
> > > > > > > Yep and I think it makes sense because link is representing one
> > > > > > > specific attachment (and I presume flags can be stored inside
> > the link
> > > > > > > itself as well, right?).
> > > > > > >
> > > > > > > But if legacy non-link BPF_PROG_ATTACH is supported then using
> > > > > > > bpf_link_info won't cover legacy prog-only attachments.
> > > > > >
> > > > > > I don't have any attachment to the legacy apis, I'm supporting
> > them
> > > > > > only because it takes two lines of code; we can go link-only if
> > there
> > > > > > is an agreement that it's inherently better.
> > > > > >
> > > > > > How about I keep sys_bpf(BPF_PROG_QUERY) as is and I do a loop
> > in the
> > > > > > userspace (for BPF_LSM_CGROUP only) over all links
> > > > > > (BPF_LINK_GET_NEXT_ID) and will find the the ones with matching
> > prog
> > > > > > ids (BPF_LINK_GET_FD_BY_ID+BPF_OBJ_GET_INFO_BY_FD)?
> > > > > >
> > > > > > That way we keep new fields in bpf_link_info, but we don't have to
> > > > > > extend sys_bpf(BPF_PROG_QUERY) because there doesn't seem to be
> > a good
> > > > > > way to do it. Exporting links via new link_fds would mean we'd
> > have to
> > > > > > support BPF_F_QUERY_EFFECTIVE, but getting an effective array of
> > links
> > > > > > seems to be messy. If, in the future, we figure out a better way
> > to
> > I don't see a clean way to get effective array from one individual
> > link[_info] through link iteration. effective array is the progs that
> > will be run at a cgroup and in such order. The prog running at a
> > cgroup doesn't necessarily linked to that cgroup.
>
> Yeah, that's the problem with exposing links via prog_info; getting an
> effective list is painful.
>
> > If staying with BPF_PROG_QUERY+BPF_F_QUERY_EFFECTIVE to get effective
> > array
> > and if it is decided the addition should be done in bpf_link_info,
> > then a list of link ids needs to be output instead of the current list of
> > prog ids. The old attach type will still have to stay with the list of
> > prog ids though :/
>
> > It will be sad not to be able to get effective only for BPF_LSM_CGROUP.
> > I found it more useful to show what will be run at a cgroup and in such
> > order instead of what is linked to a cgroup.
>
> See my hacky proof-of-concept below (on top of this series).
yeah. the PoC makes sense and I don't mind that considering
adding them to bpf_link_info (or bpf_prog_info) will be useful in
general even without this use case.
A quick thought is this is sort of partly going back to v6 but
just iterating different things instead of the bpf_lsm hooks.
>
> I think if we keep prog_info as is (don't export anything new, don't
> export the list of links), iterating through all links on the host should
> work,
> right? We get prog_ids list (effective or not, doesn't matter), then we
> go through all the links and find the ones with with the same
> prog_id (we can ignore cgroup, it shouldn't matter). Then we can export
> attach_type/attach_btf_id/etc. If it happens to be slow in the future,
> we can improve with some tbd interface to get the list of links for cgroup
> (and then we'd have to care about effective list).
>
> But the problem with going link-only is that I'd have to teach bpftool
> to use links for BPF_LSM_CGROUP and it brings a bunch of problems:
> * I'd have to pin those links somewhere to make them stick around
> * Those pin paths essentially become an API now because "detach" now
> depends on them?
> * (right now it automatically works with the legacy apis without any
> changes)
It is already the current API for all links (tracing, cgroup...). It goes
away (detach) with the process unless it is pinned. but yeah, it will
be a new exception in the "bpftool cgroup" subcommand only for
BPF_LSM_CGROUP.
If it is an issue with your use case, may be going back to v6 that extends
the query bpf_attr with attach_btf_id and support both attach API ?
Powered by blists - more mailing lists