[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <Yg52EAB3ncoj22iK@google.com>
Date: Thu, 17 Feb 2022 08:21:36 -0800
From: sdf@...gle.com
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org
Subject: Re: [RFC bpf-next 1/4] bpf: cgroup_sock lsm flavor
On 02/16, Alexei Starovoitov wrote:
> On Tue, Feb 15, 2022 at 04:12:38PM -0800, Stanislav Fomichev wrote:
> > {
> > @@ -1767,14 +1769,23 @@ static int invoke_bpf_prog(const struct
> btf_func_model *m, u8 **pprog,
> >
> > /* arg1: lea rdi, [rbp - stack_size] */
> > EMIT4(0x48, 0x8D, 0x7D, -stack_size);
> > - /* arg2: progs[i]->insnsi for interpreter */
> > - if (!p->jited)
> > - emit_mov_imm64(&prog, BPF_REG_2,
> > - (long) p->insnsi >> 32,
> > - (u32) (long) p->insnsi);
> > - /* call JITed bpf program or interpreter */
> > - if (emit_call(&prog, p->bpf_func, prog))
> > - return -EINVAL;
> > +
> > + if (p->expected_attach_type == BPF_LSM_CGROUP_SOCK) {
> > + /* arg2: progs[i] */
> > + emit_mov_imm64(&prog, BPF_REG_2, (long) p >> 32, (u32) (long) p);
> > + if (emit_call(&prog, __cgroup_bpf_run_lsm_sock, prog))
> > + return -EINVAL;
> > + } else {
> > + /* arg2: progs[i]->insnsi for interpreter */
> > + if (!p->jited)
> > + emit_mov_imm64(&prog, BPF_REG_2,
> > + (long) p->insnsi >> 32,
> > + (u32) (long) p->insnsi);
> > +
> > + /* call JITed bpf program or interpreter */
> > + if (emit_call(&prog, p->bpf_func, prog))
> > + return -EINVAL;
> Overall I think it's a workable solution.
> As far as mechanism I think it would be better
> to allocate single dummy bpf_prog and use normal fmod_ret
> registration mechanism instead of hacking arch trampoline bits.
> Set dummy_bpf_prog->bpf_func = __cgroup_bpf_run_lsm_sock;
> and keep as dummy_bpf_prog->jited = false;
> From p->insnsi pointer in arg2 it's easy to go back to struct bpf_prog.
> Such dummy prog might even be statically defined like dummy_bpf_prog.
> Or allocated dynamically once.
> It can be added as fmod_ret to multiple trampolines.
> Just gut the func_model check.
Oooh, that's much cleaner, thanks!
> As far as api the attach should probably be to a cgroup+lsm_hook pair.
> link_create.target_fd will be cgroup_fd.
> At prog load time attach_btf_id should probably be one
> of existing bpf_lsm_* hooks.
> Feels wrong to duplicate the whole set into lsm_cgroup_sock set.
lsm_cgroup_sock is there to further limit which particular lsm
hooks BPF_LSM_CGROUP_SOCK can use. I guess I can maybe look at
BTF's first argument to verify that it's 'struct socket'? Let
me try to see whether it's a good alternative..
> It's fine to have prog->expected_attach_type == BPF_LSM_CGROUP_SOCK
> to disambiguate. Will we probably only have two:
> BPF_LSM_CGROUP_SOCK and BPF_LSM_CGROUP_TASK ?
I hope so. Unless objects other than socket and task can have cgroup
association.
> > +int __cgroup_bpf_run_lsm_sock(u64 *regs, const struct bpf_prog *prog)
> > +{
> > + struct socket *sock = (void *)regs[BPF_REG_0];
> > + struct cgroup *cgrp;
> > + struct sock *sk;
> > +
> > + sk = sock->sk;
> > + if (!sk)
> > + return 0;
> > +
> > + cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > +
> > + return
> BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype],
> > + regs, bpf_prog_run, 0);
> > +}
> Would it be fast enough?
> We went through a bunch of optimization for normal cgroup and ended with:
> if (cgroup_bpf_enabled(CGROUP_INET_INGRESS) &&
> cgroup_bpf_sock_enabled(sk, CGROUP_INET_INGRESS))
> Here the trampoline code plus call into __cgroup_bpf_run_lsm_sock
> will be there for all cgroups.
> Since cgroup specific check will be inside BPF_PROG_RUN_ARRAY_CG.
> I suspect it's ok, since the link_create will be for few specific lsm
> hooks
> which are typically not in the fast path.
> Unlike traditional cgroup hook like ingress that is hot.
Right, cgroup_bpf_enabled() is not needed because lsm is by definition
off/unattached by default. Seems like we can add cgroup_bpf_sock_enabled()
to
__cgroup_bpf_run_lsm_sock.
> For BPF_LSM_CGROUP_TASK it will take cgroup from current instead of sock,
> right?
Right. Seems like the only difference is where we get the cgroup pointer
from: current vs sock->cgroup. Although, I'm a bit unsure whether to
allow hooks that are clearly sock-cgroup-based to use
BPF_LSM_CGROUP_TASK. For example, should we allow
BPF_LSM_CGROUP_TASK to attach to that socket_post_create? I'd prohibit that
at
least initially to avoid some subtle 'why sometimes my
programs trigger on the wrong cgroup' types of issues.
> Args access should magically work. 'regs' above should be fine for
> all lsm hooks.
> The typical prog:
> +SEC("lsm_cgroup_sock/socket_post_create")
> +int BPF_PROG(socket_post_create, struct socket *sock, int family,
> + int type, int protocol, int kern)
> looks good too.
> Feel natural.
> I guess they can be sleepable too?
Haven't gone into the sleepable world, but I don't see any reason why
there couldn't be a sleepable variation.
Thank you for a review!
Powered by blists - more mailing lists