[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220412010449.vmg6r72wf7pilfkw@kafai-mbp.dhcp.thefacebook.com>
Date: Mon, 11 Apr 2022 18:04:49 -0700
From: Martin KaFai Lau <kafai@...com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org
Subject: Re: [PATCH bpf-next v3 2/7] bpf: per-cgroup lsm flavor
On Mon, Apr 11, 2022 at 12:07:20PM -0700, Stanislav Fomichev wrote:
> ": , wi
>
>
>
> On Fri, Apr 8, 2022 at 3:13 PM Martin KaFai Lau <kafai@...com> wrote:
> >
> > On Thu, Apr 07, 2022 at 03:31:07PM -0700, Stanislav Fomichev wrote:
> > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > index 064eccba641d..eca258ba71d8 100644
> > > --- a/kernel/bpf/bpf_lsm.c
> > > +++ b/kernel/bpf/bpf_lsm.c
> > > @@ -35,6 +35,98 @@ BTF_SET_START(bpf_lsm_hooks)
> > > #undef LSM_HOOK
> > > BTF_SET_END(bpf_lsm_hooks)
> > >
> > > +static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > > + const struct bpf_insn *insn)
> > > +{
> > > + const struct bpf_prog *prog;
> > > + struct socket *sock;
> > > + struct cgroup *cgrp;
> > > + struct sock *sk;
> > > + int ret = 0;
> > > + u64 *regs;
> > > +
> > > + regs = (u64 *)ctx;
> > > + sock = (void *)(unsigned long)regs[BPF_REG_0];
> > > + /*prog = container_of(insn, struct bpf_prog, insnsi);*/
> > > + prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > nit. Rename prog to shim_prog.
> >
> > > +
> > > + if (unlikely(!sock))
> > Is it possible in the lsm hooks? Can these hooks
> > be rejected at the load time instead?
>
> Doesn't seem like it can be null, at least from the quick review that
> I had; I'll take a deeper look.
> I guess in general I wanted to be more defensive here because there
> are 200+ hooks, the new ones might arrive, and it's better to have the
> check?
not too worried about an extra runtime check for now.
Instead, have a concern that it will be a usage surprise when a successfully
attached bpf program is then always silently ignored.
Another question, for example, the inet_conn_request lsm_hook:
LSM_HOOK(int, 0, inet_conn_request, const struct sock *sk, struct sk_buff *skb,
struct request_sock *req)
'struct sock *sk' is the first argument, so it will use the current's cgroup.
inet_conn_request() is likely run in a softirq though and then it will be
incorrect. This runs in softirq case may not be limited to hooks that
take sk/sock argument also, not sure.
> > > + return 0;
> > > +
> > > + sk = sock->sk;
> > > + if (unlikely(!sk))
> > Same here.
> >
> > > + return 0;
> > > +
> > > + cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > > + if (likely(cgrp))
unrelated, but while talking extra check,
I think the shim_prog has already acted as a higher level (per attach-btf_id)
knob but do you think it may still worth to do a bpf_empty_prog_array
check here in case a cgroup may not have prog to run ?
> > > + ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > > + ctx, bpf_prog_run, 0);
[ ... ]
> > > @@ -100,6 +123,15 @@ static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link)
> > > link->cgroup = NULL;
> > > }
> > >
> > > +static void bpf_cgroup_lsm_shim_release(struct bpf_prog *prog,
> > > + enum cgroup_bpf_attach_type atype)
> > > +{
> > > + if (!prog || atype != prog->aux->cgroup_atype)
> > prog cannot be NULL here, no?
> >
> > The 'atype != prog->aux->cgroup_atype' looks suspicious also considering
> > prog->aux->cgroup_atype is only initialized (and meaningful) for BPF_LSM_CGROUP.
> > I suspect incorrectly passing this test will crash in the below
> > bpf_trampoline_unlink_cgroup_shim(). More on this later.
> >
> > > + return;
> > > +
> > > + bpf_trampoline_unlink_cgroup_shim(prog);
> > > +}
> > > +
> > > /**
> > > * cgroup_bpf_release() - put references of all bpf programs and
> > > * release all cgroup bpf data
> > > @@ -123,10 +155,16 @@ static void cgroup_bpf_release(struct work_struct *work)
> > Copying some missing loop context here:
> >
> > for (atype = 0; atype < ARRAY_SIZE(cgrp->bpf.progs); atype++) {
> > struct list_head *progs = &cgrp->bpf.progs[atype];
> > struct bpf_prog_list *pl, *pltmp;
> >
> > >
> > > list_for_each_entry_safe(pl, pltmp, progs, node) {
> > > list_del(&pl->node);
> > > - if (pl->prog)
> > > + if (pl->prog) {
> > > + bpf_cgroup_lsm_shim_release(pl->prog,
> > > + atype);
> > atype could be 0 (CGROUP_INET_INGRESS) here. bpf_cgroup_lsm_shim_release()
> > above will go ahead with bpf_trampoline_unlink_cgroup_shim().
> > It will break some of the assumptions. e.g. prog->aux->attach_btf is NULL
> > for CGROUP_INET_INGRESS.
> >
> > Instead, only call bpf_cgroup_lsm_shim_release() for BPF_LSM_CGROUP ?
> >
> > If the above observation is sane, I wonder if the existing test_progs
> > have uncovered it or may be the existing tests just always detach
> > cleanly itself before cleaning the cgroup which then avoided this case.
>
> Might be what's happening here:
>
> https://github.com/kernel-patches/bpf/runs/5876983908?check_suite_focus=true
hmm.... this one looks different. I am thinking the oops should happen
in bpf_obj_id() which is not inlined. didn't ring any bell for now
after a quick look, so yeah let's fix the known first.
>
> Although, I'm not sure why it's z15 only. Good point on filtering by
> BPF_LSM_CGROUP, will do.
>
> > > bpf_prog_put(pl->prog);
> > > - if (pl->link)
> > > + }
> > > + if (pl->link) {
> > > + bpf_cgroup_lsm_shim_release(pl->link->link.prog,
> > > + atype);
> > > bpf_cgroup_link_auto_detach(pl->link);
> > > + }
> > > kfree(pl);
> > > static_branch_dec(&cgroup_bpf_enabled_key[atype]);
> > > }
[ ... ]
> > > +int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
> > > + struct bpf_attach_target_info *tgt_info)
> > > +{
> > > + struct bpf_prog *shim_prog = NULL;
> > > + struct bpf_trampoline *tr;
> > > + bpf_func_t bpf_func;
> > > + u64 key;
> > > + int err;
> > > +
> > > + key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
> > > + prog->aux->attach_btf_id);
> > > +
> > > + err = bpf_lsm_find_cgroup_shim(prog, &bpf_func);
> > > + if (err)
> > > + return err;
> > > +
> > > + tr = bpf_trampoline_get(key, tgt_info);
> > > + if (!tr)
> > > + return -ENOMEM;
> > > +
> > > + mutex_lock(&tr->mutex);
> > > +
> > > + shim_prog = cgroup_shim_find(tr, bpf_func);
> > > + if (shim_prog) {
> > > + /* Reusing existing shim attached by the other program.
> > > + */
> > The shim_prog is reused by >1 BPF_LSM_CGROUP progs and
> > shim_prog is hidden from the userspace also (no id), so it may worth
> > to bring this up:
> >
> > In __bpf_prog_enter(), other than some bpf stats of the shim_prog
> > will become useless which is a very minor thing, it is also checking
> > shim_prog->active and bump the misses counter. Now, the misses counter
> > is no longer visible to users. Since it is actually running the cgroup prog,
> > may be there is no need for the active check ?
>
> Agree that the active counter will probably be taken care of when the
> actual program runs;
iirc, the BPF_PROG_RUN_ARRAY_CG does not need the active counter.
> but now sure it worth the effort in trying to
> remove it here?
I was thinking if the active counter got triggered and missed calling the
BPF_LSM_CGROUP, then there is no way to tell this case got hit without
exposing the stats of the shim_prog and it could be a pretty hard
problem to chase. It probably won't be an issue for non-sleepable now
if the rcu_read_lock() maps to preempt_disable(). Not sure about the
future sleepable case.
I am thinking to avoid doing all the active count and stats count
in __bpf_prog_enter() and __bpf_prog_exit() for BPF_LSM_CGROUP. afaik,
only the rcu_read_lock and rcu_read_unlock are useful to protect
the shim_prog itself. May be a __bpf_nostats_enter() and
__bpf_nostats_exit().
> Regarding "no longer visible to users": that's a good point. Should I
> actually add those shim progs to the prog_idr? Or just hide it as
> "internal implementation detail"?
Then no need to expose the shim_progs to the idr.
~~~~
[ btw, while thinking the shim_prog, I also think there is no need for one
shim_prog for each attach_btf_id which is essentially
prog->aux->cgroup_atype. The static prog->aux->cgroup_atype can be
passed in the stack when preparing the trampoline.
just an idea and not suggesting must be done now. This can be
optimized later since it does not affect the API. ]
Powered by blists - more mailing lists