[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e693a6f3-7058-0bc8-d5a6-f1572ebd75f8@fb.com>
Date: Mon, 23 May 2022 08:41:38 -0700
From: Yonghong Song <yhs@...com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org
Subject: Re: [PATCH bpf-next v7 03/11] bpf: per-cgroup lsm flavor
On 5/20/22 5:03 PM, Stanislav Fomichev wrote:
> On Thu, May 19, 2022 at 6:01 PM Yonghong Song <yhs@...com> wrote:
>>
>>
>>
>> On 5/18/22 3:55 PM, Stanislav Fomichev wrote:
>>> Allow attaching to lsm hooks in the cgroup context.
>>>
>>> Attaching to per-cgroup LSM works exactly like attaching
>>> to other per-cgroup hooks. New BPF_LSM_CGROUP is added
>>> to trigger new mode; the actual lsm hook we attach to is
>>> signaled via existing attach_btf_id.
>>>
>>> For the hooks that have 'struct socket' or 'struct sock' as its first
>>> argument, we use the cgroup associated with that socket. For the rest,
>>> we use 'current' cgroup (this is all on default hierarchy == v2 only).
>>> Note that for some hooks that work on 'struct sock' we still
>>> take the cgroup from 'current' because some of them work on the socket
>>> that hasn't been properly initialized yet.
>>>
>>> Behind the scenes, we allocate a shim program that is attached
>>> to the trampoline and runs cgroup effective BPF programs array.
>>> This shim has some rudimentary ref counting and can be shared
>>> between several programs attaching to the same per-cgroup lsm hook.
>>>
>>> Note that this patch bloats cgroup size because we add 211
>>> cgroup_bpf_attach_type(s) for simplicity sake. This will be
>>> addressed in the subsequent patch.
>>>
>>> Also note that we only add non-sleepable flavor for now. To enable
>>> sleepable use-cases, bpf_prog_run_array_cg has to grab trace rcu,
>>> shim programs have to be freed via trace rcu, cgroup_bpf.effective
>>> should be also trace-rcu-managed + maybe some other changes that
>>> I'm not aware of.
>>>
>>> Signed-off-by: Stanislav Fomichev <sdf@...gle.com>
>>> ---
>>> arch/x86/net/bpf_jit_comp.c | 24 +++--
>>> include/linux/bpf-cgroup-defs.h | 6 ++
>>> include/linux/bpf-cgroup.h | 7 ++
>>> include/linux/bpf.h | 25 +++++
>>> include/linux/bpf_lsm.h | 14 +++
>>> include/linux/btf_ids.h | 3 +-
>>> include/uapi/linux/bpf.h | 1 +
>>> kernel/bpf/bpf_lsm.c | 50 +++++++++
>>> kernel/bpf/btf.c | 11 ++
>>> kernel/bpf/cgroup.c | 181 ++++++++++++++++++++++++++++---
>>> kernel/bpf/core.c | 2 +
>>> kernel/bpf/syscall.c | 10 ++
>>> kernel/bpf/trampoline.c | 184 ++++++++++++++++++++++++++++++++
>>> kernel/bpf/verifier.c | 28 +++++
>>> tools/include/linux/btf_ids.h | 4 +-
>>> tools/include/uapi/linux/bpf.h | 1 +
>>> 16 files changed, 527 insertions(+), 24 deletions(-)
>>
>> A few nits below.
>>
>>>
>>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>>> index a2b6d197c226..5cdebf4312da 100644
>>> --- a/arch/x86/net/bpf_jit_comp.c
>>> +++ b/arch/x86/net/bpf_jit_comp.c
>>> @@ -1765,6 +1765,10 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
>>> struct bpf_tramp_link *l, int stack_size,
>>> int run_ctx_off, bool save_ret)
>>> {
>>> + void (*exit)(struct bpf_prog *prog, u64 start,
>>> + struct bpf_tramp_run_ctx *run_ctx) = __bpf_prog_exit;
>>> + u64 (*enter)(struct bpf_prog *prog,
>>> + struct bpf_tramp_run_ctx *run_ctx) = __bpf_prog_enter;
>>> u8 *prog = *pprog;
>>> u8 *jmp_insn;
>>> int ctx_cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
[...]
>>> return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
>>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>>> index 134785ab487c..2c356a38f4cf 100644
>>> --- a/kernel/bpf/cgroup.c
>>> +++ b/kernel/bpf/cgroup.c
>>> @@ -14,6 +14,9 @@
>>> #include <linux/string.h>
>>> #include <linux/bpf.h>
>>> #include <linux/bpf-cgroup.h>
>>> +#include <linux/btf_ids.h>
>>> +#include <linux/bpf_lsm.h>
>>> +#include <linux/bpf_verifier.h>
>>> #include <net/sock.h>
>>> #include <net/bpf_sk_storage.h>
>>>
>>> @@ -61,6 +64,85 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
>>> return run_ctx.retval;
>>> }
>>>
>>> +unsigned int __cgroup_bpf_run_lsm_sock(const void *ctx,
>>> + const struct bpf_insn *insn)
>>> +{
>>> + const struct bpf_prog *shim_prog;
>>> + struct sock *sk;
>>> + struct cgroup *cgrp;
>>> + int ret = 0;
>>> + u64 *regs;
>>> +
>>> + regs = (u64 *)ctx;
>>> + sk = (void *)(unsigned long)regs[BPF_REG_0];
>>
>> Maybe just my own opinion. Using BPF_REG_0 as index is a little bit
>> confusing. Maybe just use '0' to indicate the first parameters.
>> Maybe change 'regs' to 'params' is also a better choice?
>> In reality, trampline just passed an array of parameters to
>> the program. The same for a few places below.
>
> Sure, let's rename it and use 0. I'll do args instead of params maybe?
'args' works for me too.
>
>>> + /*shim_prog = container_of(insn, struct bpf_prog, insnsi);*/
>>> + shim_prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
>>
>> I didn't experiment, but why container_of won't work?
>
> There is a type check in container_of that doesn't seem to work for flex arrays:
>
> kernel/bpf/cgroup.c:78:14: error: static_assert failed due to
> requirement '__builtin_types_compatible_p(const struct bpf_insn,
> struct bpf_insn []"
> shim_prog = container_of(insn, struct bpf_prog, insnsi);
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ./include/linux/container_of.h:19:2: note: expanded from macro 'container_of'
> static_assert(__same_type(*(ptr), ((type *)0)->member) || \
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ./include/linux/build_bug.h:77:34: note: expanded from macro 'static_assert'
> #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ./include/linux/build_bug.h:78:41: note: expanded from macro '__static_assert'
> #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
> ^ ~~~~
> 1 error generated.
You are right. Thanks for explanation.
[...]
Powered by blists - more mailing lists