[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200326233533.gbyogvi57xufe34d@ast-mbp>
Date: Thu, 26 Mar 2020 16:35:33 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Andrii Nakryiko <andriin@...com>
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org, ast@...com,
daniel@...earbox.net, rdna@...com, andrii.nakryiko@...il.com,
kernel-team@...com
Subject: Re: [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for
an active bpf_cgroup_link
On Tue, Mar 24, 2020 at 11:57:44PM -0700, Andrii Nakryiko wrote:
>
> +/* Swap updated BPF program for given link in effective program arrays across
> + * all descendant cgroups. This function is guaranteed to succeed.
> + */
> +static void replace_effective_prog(struct cgroup *cgrp,
> + enum bpf_attach_type type,
> + struct bpf_cgroup_link *link)
> +{
> + struct bpf_prog_array_item *item;
> + struct cgroup_subsys_state *css;
> + struct bpf_prog_array *progs;
> + struct bpf_prog_list *pl;
> + struct list_head *head;
> + struct cgroup *cg;
> + int pos;
> +
> + css_for_each_descendant_pre(css, &cgrp->self) {
> + struct cgroup *desc = container_of(css, struct cgroup, self);
> +
> + if (percpu_ref_is_zero(&desc->bpf.refcnt))
> + continue;
> +
> + /* found position of link in effective progs array */
> + for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) {
> + if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI))
> + continue;
> +
> + head = &cg->bpf.progs[type];
> + list_for_each_entry(pl, head, node) {
> + if (!prog_list_prog(pl))
> + continue;
> + if (pl->link == link)
> + goto found;
> + pos++;
> + }
> + }
> +found:
> + BUG_ON(!cg);
> + progs = rcu_dereference_protected(
> + desc->bpf.effective[type],
> + lockdep_is_held(&cgroup_mutex));
> + item = &progs->items[pos];
> + WRITE_ONCE(item->prog, link->link.prog);
> + }
> +}
> +
> +/**
> + * __cgroup_bpf_replace() - Replace link's program and propagate the change
> + * to descendants
> + * @cgrp: The cgroup which descendants to traverse
> + * @link: A link for which to replace BPF program
> + * @type: Type of attach operation
> + *
> + * Must be called with cgroup_mutex held.
> + */
> +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> + struct bpf_prog *new_prog)
> +{
> + struct list_head *progs = &cgrp->bpf.progs[link->type];
> + struct bpf_prog *old_prog;
> + struct bpf_prog_list *pl;
> + bool found = false;
> +
> + if (link->link.prog->type != new_prog->type)
> + return -EINVAL;
> +
> + list_for_each_entry(pl, progs, node) {
> + if (pl->link == link) {
> + found = true;
> + break;
> + }
> + }
> + if (!found)
> + return -ENOENT;
> +
> + old_prog = xchg(&link->link.prog, new_prog);
> + replace_effective_prog(cgrp, link->type, link);
I think with 'found = true' in this function you're assuming that it will be
found in replace_effective_prog() ? I don't think that's the case.
Try to create bpf_link with BPF_F_ALLOW_OVERRIDE, override it in a child cgroup
with another link and then try to LINK_UPDATE the former. The link is there,
but the prog is not executing and it's not in effective array. What LINK_UPDATE
suppose to do? I guess it should succeed?
Even trickier that the prog will be in effective array in some of
css_for_each_descendant_pre() and not in others. This cgroup attach semantics
were convoluted from the day one. Apparently people use all three variants now,
but I wouldn't bet that everyone understands it.
Hence my proposal to support F_ALLOW_MULTI for links only. At least initially.
It's so much simpler to explain. And owning bpf_link will guarantee that the
prog is executing (unless cgroup is removed and sockets are closed). I guess
default (no-override) is acceptable to bpf_link as well and in that sense it
will be very similar to XDP with single prog attached. So I think I can live
with default and ALLOW_MULTI for now. But we should probably redesign
overriding capabilities. Folks need to attach multiple progs to a given cgroup
and disallow all progs in children. Currently it's not possible to do, since
MULTI in the parent allows at least one (default, override or multi) in the
children. bpf_link inheriting this logic won't help to solve this use case. It
feels that link should stay as multi only and override or not in the children
should be a separate property. Probably not related to link at all. It fits
better as a cgroup permission.
Anyhow I'm going to apply patches 1 and 2, since they are good cleanup
regardless of what we decide here.
Powered by blists - more mailing lists