[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220525203935.xkjeb7qkfltjsfqc@kafai-mbp>
Date: Wed, 25 May 2022 13:39:35 -0700
From: Martin KaFai Lau <kafai@...com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: Andrii Nakryiko <andrii.nakryiko@...il.com>,
Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>
Subject: Re: [PATCH bpf-next v7 05/11] bpf: implement BPF_PROG_QUERY for
BPF_LSM_CGROUP
On Wed, May 25, 2022 at 10:02:07AM -0700, Stanislav Fomichev wrote:
> On Wed, May 25, 2022 at 9:01 AM Stanislav Fomichev <sdf@...gle.com> wrote:
> >
> > On Tue, May 24, 2022 at 9:39 PM Andrii Nakryiko
> > <andrii.nakryiko@...il.com> wrote:
> > >
> > > On Tue, May 24, 2022 at 9:03 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> > > >
> > > > On Tue, May 24, 2022 at 4:45 PM Andrii Nakryiko
> > > > <andrii.nakryiko@...il.com> wrote:
> > > > >
> > > > > On Tue, May 24, 2022 at 10:50 AM Martin KaFai Lau <kafai@...com> wrote:
> > > > > >
> > > > > > On Tue, May 24, 2022 at 08:55:04AM -0700, Stanislav Fomichev wrote:
> > > > > > > On Mon, May 23, 2022 at 8:49 PM Martin KaFai Lau <kafai@...com> wrote:
> > > > > > > >
> > > > > > > > On Wed, May 18, 2022 at 03:55:25PM -0700, Stanislav Fomichev wrote:
> > > > > > > > > We have two options:
> > > > > > > > > 1. Treat all BPF_LSM_CGROUP the same, regardless of attach_btf_id
> > > > > > > > > 2. Treat BPF_LSM_CGROUP+attach_btf_id as a separate hook point
> > > > > > > > >
> > > > > > > > > I was doing (2) in the original patch, but switching to (1) here:
> > > > > > > > >
> > > > > > > > > * bpf_prog_query returns all attached BPF_LSM_CGROUP programs
> > > > > > > > > regardless of attach_btf_id
> > > > > > > > > * attach_btf_id is exported via bpf_prog_info
> > > > > > > > >
> > > > > > > > > Signed-off-by: Stanislav Fomichev <sdf@...gle.com>
> > > > > > > > > ---
> > > > > > > > > include/uapi/linux/bpf.h | 5 ++
> > > > > > > > > kernel/bpf/cgroup.c | 103 +++++++++++++++++++++++++++------------
> > > > > > > > > kernel/bpf/syscall.c | 4 +-
> > > > > > > > > 3 files changed, 81 insertions(+), 31 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > > > > index b9d2d6de63a7..432fc5f49567 100644
> > > > > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > > > > @@ -1432,6 +1432,7 @@ union bpf_attr {
> > > > > > > > > __u32 attach_flags;
> > > > > > > > > __aligned_u64 prog_ids;
> > > > > > > > > __u32 prog_cnt;
> > > > > > > > > + __aligned_u64 prog_attach_flags; /* output: per-program attach_flags */
> > > > > > > > > } query;
> > > > > > > > >
> > > > > > > > > struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
> > > > > > > > > @@ -5911,6 +5912,10 @@ struct bpf_prog_info {
> > > > > > > > > __u64 run_cnt;
> > > > > > > > > __u64 recursion_misses;
> > > > > > > > > __u32 verified_insns;
> > > > > > > > > + /* BTF ID of the function to attach to within BTF object identified
> > > > > > > > > + * by btf_id.
> > > > > > > > > + */
> > > > > > > > > + __u32 attach_btf_func_id;
> > > > > > > > > } __attribute__((aligned(8)));
> > > > > > > > >
> > > > > > > > > struct bpf_map_info {
> > > > > > > > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > > > > > > > > index a959cdd22870..08a1015ee09e 100644
> > > > > > > > > --- a/kernel/bpf/cgroup.c
> > > > > > > > > +++ b/kernel/bpf/cgroup.c
> > > > > > > > > @@ -1074,6 +1074,7 @@ static int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> > > > > > > > > static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
> > > > > > > > > union bpf_attr __user *uattr)
> > > > > > > > > {
> > > > > > > > > + __u32 __user *prog_attach_flags = u64_to_user_ptr(attr->query.prog_attach_flags);
> > > > > > > > > __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
> > > > > > > > > enum bpf_attach_type type = attr->query.attach_type;
> > > > > > > > > enum cgroup_bpf_attach_type atype;
> > > > > > > > > @@ -1081,50 +1082,92 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
> > > > > > > > > struct hlist_head *progs;
> > > > > > > > > struct bpf_prog *prog;
> > > > > > > > > int cnt, ret = 0, i;
> > > > > > > > > + int total_cnt = 0;
> > > > > > > > > u32 flags;
> > > > > > > > >
> > > > > > > > > - atype = to_cgroup_bpf_attach_type(type);
> > > > > > > > > - if (atype < 0)
> > > > > > > > > - return -EINVAL;
> > > > > > > > > + enum cgroup_bpf_attach_type from_atype, to_atype;
> > > > > > > > >
> > > > > > > > > - progs = &cgrp->bpf.progs[atype];
> > > > > > > > > - flags = cgrp->bpf.flags[atype];
> > > > > > > > > + if (type == BPF_LSM_CGROUP) {
> > > > > > > > > + from_atype = CGROUP_LSM_START;
> > > > > > > > > + to_atype = CGROUP_LSM_END;
> > > > > > > > > + } else {
> > > > > > > > > + from_atype = to_cgroup_bpf_attach_type(type);
> > > > > > > > > + if (from_atype < 0)
> > > > > > > > > + return -EINVAL;
> > > > > > > > > + to_atype = from_atype;
> > > > > > > > > + }
> > > > > > > > >
> > > > > > > > > - effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
> > > > > > > > > - lockdep_is_held(&cgroup_mutex));
> > > > > > > > > + for (atype = from_atype; atype <= to_atype; atype++) {
> > > > > > > > > + progs = &cgrp->bpf.progs[atype];
> > > > > > > > > + flags = cgrp->bpf.flags[atype];
> > > > > > > > >
> > > > > > > > > - if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE)
> > > > > > > > > - cnt = bpf_prog_array_length(effective);
> > > > > > > > > - else
> > > > > > > > > - cnt = prog_list_length(progs);
> > > > > > > > > + effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
> > > > > > > > > + lockdep_is_held(&cgroup_mutex));
> > > > > > > > >
> > > > > > > > > - if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
> > > > > > > > > - return -EFAULT;
> > > > > > > > > - if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt)))
> > > > > > > > > + if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE)
> > > > > > > > > + total_cnt += bpf_prog_array_length(effective);
> > > > > > > > > + else
> > > > > > > > > + total_cnt += prog_list_length(progs);
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > + if (type != BPF_LSM_CGROUP)
> > > > > > > > > + if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
> > > > > > > > > + return -EFAULT;
> > > > > > > > > + if (copy_to_user(&uattr->query.prog_cnt, &total_cnt, sizeof(total_cnt)))
> > > > > > > > > return -EFAULT;
> > > > > > > > > - if (attr->query.prog_cnt == 0 || !prog_ids || !cnt)
> > > > > > > > > + if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt)
> > > > > > > > > /* return early if user requested only program count + flags */
> > > > > > > > > return 0;
> > > > > > > > > - if (attr->query.prog_cnt < cnt) {
> > > > > > > > > - cnt = attr->query.prog_cnt;
> > > > > > > > > +
> > > > > > > > > + if (attr->query.prog_cnt < total_cnt) {
> > > > > > > > > + total_cnt = attr->query.prog_cnt;
> > > > > > > > > ret = -ENOSPC;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > - if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) {
> > > > > > > > > - return bpf_prog_array_copy_to_user(effective, prog_ids, cnt);
> > > > > > > > > - } else {
> > > > > > > > > - struct bpf_prog_list *pl;
> > > > > > > > > - u32 id;
> > > > > > > > > + for (atype = from_atype; atype <= to_atype; atype++) {
> > > > > > > > > + if (total_cnt <= 0)
> > > > > > > > > + break;
> > > > > > > > >
> > > > > > > > > - i = 0;
> > > > > > > > > - hlist_for_each_entry(pl, progs, node) {
> > > > > > > > > - prog = prog_list_prog(pl);
> > > > > > > > > - id = prog->aux->id;
> > > > > > > > > - if (copy_to_user(prog_ids + i, &id, sizeof(id)))
> > > > > > > > > - return -EFAULT;
> > > > > > > > > - if (++i == cnt)
> > > > > > > > > - break;
> > > > > > > > > + progs = &cgrp->bpf.progs[atype];
> > > > > > > > > + flags = cgrp->bpf.flags[atype];
> > > > > > > > > +
> > > > > > > > > + effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
> > > > > > > > > + lockdep_is_held(&cgroup_mutex));
> > > > > > > > > +
> > > > > > > > > + if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE)
> > > > > > > > > + cnt = bpf_prog_array_length(effective);
> > > > > > > > > + else
> > > > > > > > > + cnt = prog_list_length(progs);
> > > > > > > > > +
> > > > > > > > > + if (cnt >= total_cnt)
> > > > > > > > > + cnt = total_cnt;
> > > > > > > > > +
> > > > > > > > > + if (attr->query.query_flags & BPF_F_QUERY_EFFECTIVE) {
> > > > > > > > > + ret = bpf_prog_array_copy_to_user(effective, prog_ids, cnt);
> > > > > > > > > + } else {
> > > > > > > > > + struct bpf_prog_list *pl;
> > > > > > > > > + u32 id;
> > > > > > > > > +
> > > > > > > > > + i = 0;
> > > > > > > > > + hlist_for_each_entry(pl, progs, node) {
> > > > > > > > > + prog = prog_list_prog(pl);
> > > > > > > > > + id = prog->aux->id;
> > > > > > > > > + if (copy_to_user(prog_ids + i, &id, sizeof(id)))
> > > > > > > > > + return -EFAULT;
> > > > > > > > > + if (++i == cnt)
> > > > > > > > > + break;
> > > > > > > > > + }
> > > > > > > > > }
> > > > > > > > > +
> > > > > > > > > + if (prog_attach_flags)
> > > > > > > > > + for (i = 0; i < cnt; i++)
> > > > > > > > > + if (copy_to_user(prog_attach_flags + i, &flags, sizeof(flags)))
> > > > > > > > > + return -EFAULT;
> > > > > > > > > +
> > > > > > > > > + prog_ids += cnt;
> > > > > > > > > + total_cnt -= cnt;
> > > > > > > > > + if (prog_attach_flags)
> > > > > > > > > + prog_attach_flags += cnt;
> > > > > > > > > }
> > > > > > > > > return ret;
> > > > > > > > > }
> > > > > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > > > > > > index 5ed2093e51cc..4137583c04a2 100644
> > > > > > > > > --- a/kernel/bpf/syscall.c
> > > > > > > > > +++ b/kernel/bpf/syscall.c
> > > > > > > > > @@ -3520,7 +3520,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > -#define BPF_PROG_QUERY_LAST_FIELD query.prog_cnt
> > > > > > > > > +#define BPF_PROG_QUERY_LAST_FIELD query.prog_attach_flags
> > > > > > > > >
> > > > > > > > > static int bpf_prog_query(const union bpf_attr *attr,
> > > > > > > > > union bpf_attr __user *uattr)
> > > > > > > > > @@ -3556,6 +3556,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
> > > > > > > > > case BPF_CGROUP_SYSCTL:
> > > > > > > > > case BPF_CGROUP_GETSOCKOPT:
> > > > > > > > > case BPF_CGROUP_SETSOCKOPT:
> > > > > > > > > + case BPF_LSM_CGROUP:
> > > > > > > > > return cgroup_bpf_prog_query(attr, uattr);
> > > > > > > > > case BPF_LIRC_MODE2:
> > > > > > > > > return lirc_prog_query(attr, uattr);
> > > > > > > > > @@ -4066,6 +4067,7 @@ static int bpf_prog_get_info_by_fd(struct file *file,
> > > > > > > > >
> > > > > > > > > if (prog->aux->btf)
> > > > > > > > > info.btf_id = btf_obj_id(prog->aux->btf);
> > > > > > > > > + info.attach_btf_func_id = prog->aux->attach_btf_id;
> > > > > > > > Note that exposing prog->aux->attach_btf_id only may not be enough
> > > > > > > > unless it can assume info.attach_btf_id is always referring to btf_vmlinux
> > > > > > > > for all bpf prog types.
> > > > > > >
> > > > > > > We also export btf_id two lines above, right? Btw, I left a comment in
> > > > > > > the bpftool about those btf_ids, I'm not sure how resolve them and
> > > > > > > always assume vmlinux for now.
> > > > > > yeah, that btf_id above is the cgroup-lsm prog's btf_id which has its
> > > > > > func info, line info...etc. It is not the one the attach_btf_id correspond
> > > > > > to. attach_btf_id refers to either aux->attach_btf or aux->dst_prog's btf (or
> > > > > > target btf id here).
> > > > > >
> > > > > > It needs a consensus on where this attach_btf_id, target btf id, and
> > > > > > prog_attach_flags should be. If I read the patch 7 thread correctly,
> > > > > > I think Andrii is suggesting to expose them to userspace through link, so
> > > > > > potentially putting them in bpf_link_info. The bpf_prog_query will
> > > > > > output a list of link ids. The same probably applies to
> > > > >
> > > > > Yep and I think it makes sense because link is representing one
> > > > > specific attachment (and I presume flags can be stored inside the link
> > > > > itself as well, right?).
> > > > >
> > > > > But if legacy non-link BPF_PROG_ATTACH is supported then using
> > > > > bpf_link_info won't cover legacy prog-only attachments.
> > > >
> > > > I don't have any attachment to the legacy apis, I'm supporting them
> > > > only because it takes two lines of code; we can go link-only if there
> > > > is an agreement that it's inherently better.
> > > >
> > > > How about I keep sys_bpf(BPF_PROG_QUERY) as is and I do a loop in the
> > > > userspace (for BPF_LSM_CGROUP only) over all links
> > > > (BPF_LINK_GET_NEXT_ID) and will find the the ones with matching prog
> > > > ids (BPF_LINK_GET_FD_BY_ID+BPF_OBJ_GET_INFO_BY_FD)?
> > > >
> > > > That way we keep new fields in bpf_link_info, but we don't have to
> > > > extend sys_bpf(BPF_PROG_QUERY) because there doesn't seem to be a good
> > > > way to do it. Exporting links via new link_fds would mean we'd have to
> > > > support BPF_F_QUERY_EFFECTIVE, but getting an effective array of links
> > > > seems to be messy. If, in the future, we figure out a better way to
I don't see a clean way to get effective array from one individual
link[_info] through link iteration. effective array is the progs that
will be run at a cgroup and in such order. The prog running at a
cgroup doesn't necessarily linked to that cgroup.
If staying with BPF_PROG_QUERY+BPF_F_QUERY_EFFECTIVE to get effective array
and if it is decided the addition should be done in bpf_link_info,
then a list of link ids needs to be output instead of the current list of
prog ids. The old attach type will still have to stay with the list of
prog ids though :/
It will be sad not to be able to get effective only for BPF_LSM_CGROUP.
I found it more useful to show what will be run at a cgroup and in such
order instead of what is linked to a cgroup.
> > > > expose a list of attached/effective links per cgroup, we can
> > > > convert/optimize bpftool.
> > >
> > > Why not use iter/bpf_link program (see progs/bpf_iter_bpf_link.c for
> > > an example) instead? Once you have struct bpf_link and you know it's
> > > cgroup link, you can cast it to struct bpf_cgroup_link and get access
> > > to prog and cgroup. From cgroup to cgroup_bpf you can even get access
> > > to effective array. Basically whatever kernel has access to you can
> > > have access to from bpftool without extending any UAPIs.
> >
> > Seems a bit too involved just to read back the fields? I might as well
> > use drgn? I'm also not sure about the implementation: will I be able
> > to upcast bpf_link to bpf_cgroup_link in the bpf prog? And getting
> > attach_type might be problematic from the iterator program as well: I
> > need to call kernel's bpf_lsm_attach_type_get to find atype for
> > attach_btf_id, I'd have to export it as kfunc?
>
> I've prototyped whatever I've suggested above and there is another
> problem with going link-only: bpftool currently uses bpf_prog_attach
> unconditionally; we'd have to change that to use links for
> BPF_LSM_CGROUP (and pin them in some hard-coded locations?) :-(
> I'm leaning towards keeping those legacy apis around and exporting via
> prog_info; there doesn't seem to be a clear benefit :-(
Powered by blists - more mailing lists