[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAADnVQ+_ViQ6GLgscTLuWnsuhS8eajNSDG3jpTAjYWUoDBJvTg@mail.gmail.com>
Date: Thu, 9 Jan 2025 13:24:49 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Song Liu <song@...nel.org>
Cc: Juntong Deng <juntong.deng@...look.com>, Tejun Heo <tj@...nel.org>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.fastabend@...il.com>, Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Eddy Z <eddyz87@...il.com>,
Yonghong Song <yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
Kumar Kartikeya Dwivedi <memxor@...il.com>, snorcht@...il.com,
Christian Brauner <brauner@...nel.org>, bpf <bpf@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux-Fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: per st_ops kfunc allow/deny mask. Was: [PATCH bpf-next v6 4/5]
bpf: Make fs kfuncs available for SYSCALL program type
On Thu, Jan 9, 2025 at 12:49 PM Song Liu <song@...nel.org> wrote:
>
> On Thu, Jan 9, 2025 at 11:24 AM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Mon, Dec 23, 2024 at 4:51 PM Juntong Deng <juntong.deng@...look.com> wrote:
> > >
> > > >
> > > > The main goal is to get rid of run-time mask check in SCX_CALL_OP() and
> > > > make it static by the verifier. To make that happen scx_kf_mask flags
> > > > would need to become KF_* flags while each struct-ops callback will
> > > > specify the expected mask.
> > > > Then at struct-ops prog attach time the verifier will see the expected mask
> > > > and can check that all kfuncs calls of this particular program
> > > > satisfy the mask. Then all of the runtime overhead of
> > > > current->scx.kf_mask and scx_kf_allowed() will go away.
> > >
> > > Thanks for pointing this out.
> > >
> > > Yes, I am interested in working on it.
> > >
> > > I will try to solve this problem in a separate patch series.
> > >
> > >
> > > The following are my thoughts:
> > >
> > > Should we really use KF_* to do this? I think KF_* is currently more
> > > like declaring that a kfunc has some kind of attribute, e.g.
> > > KF_TRUSTED_ARGS means that the kfunc only accepts trusted arguments,
> > > rather than being used to categorise kfuncs.
> > >
> > > It is not sustainable to restrict the kfuncs that can be used based on
> > > program types, which are coarse-grained. This problem will get worse
> > > as kfuncs increase.
> > >
> > > In my opinion, managing the kfuncs available to bpf programs should be
> > > implemented as capabilities. Capabilities are a mature permission model.
> > > We can treat a set of kfuncs as a capability (like the various current
> > > kfunc_sets, but the current kfunc_sets did not carefully divide
> > > permissions).
> > >
> > > We should use separate BPF_CAP_XXX flags to manage these capabilities.
> > > For example, SCX may define BPF_CAP_SCX_DISPATCH.
> > >
> > > For program types, we should divide them into two levels, types and
> > > subtypes. Types are used to register common capabilities and subtypes
> > > are used to register specific capabilities. The verifier can check if
> > > the used kfuncs are allowed based on the type and subtype of the bpf
> > > program.
> > >
> > > I understand that we need to maintain backward compatibility to
> > > userspace, but capabilities are internal changes in the kernel.
> > > Perhaps we can make the current program types as subtypes and
> > > add 'types' that are only used internally, and more subtypes
> > > (program types) can be added in the future.
> >
> > Sorry for the delay.
> > imo CAP* approach doesn't fit.
> > caps are security bits exposed to user space.
> > Here there is no need to expose anything to user space.
> >
> > But you're also correct that we cannot extend kfunc KF_* flags
> > that easily. KF_* flags are limited to 32-bit and we're already
> > using 12 bits.
> > enum scx_kf_mask needs 5 bits, so we can squeeze them into
> > the current 32-bit field _for now_,
> > but eventually we'd need to refactor kfunc definition into a wider set:
> > BTF_ID_FLAGS(func, .. KF_*)
> > so that different struct_ops consumers can define their own bits.
> >
> > Right now SCX is the only st_ops consumer who needs this feature,
> > so let's squeeze into the existing KF facility.
> >
> > First step is to remap scx_kf_mask bits into unused bits in KF_
> > and annotate corresponding sched-ext kfuncs with it.
> > For example:
> > SCX_KF_DISPATCH will become
> > KF_DISPATCH (1 << 13)
> >
> > and all kfuncs that are allowed to be called from ->dispatch() callback
> > will be annotated like:
> > - BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
> > - BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots)
> > - BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel)
> > + BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
> > + BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots, KF_DISPATCH)
> > + BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel, KF_DISPATCH)
> >
> >
> > For sched_ext_ops callback annotations, I think,
> > the simplest approach is to add special
> > BTF_SET8_START(st_ops_flags)
> > BTF_ID_FLAGS(func, sched_ext_ops__dispatch, KF_DISPATCH)
> > and so on for other ops stubs.
> >
> > sched_ext_ops__dispatch() is an empty function that
> > exists in the vmlinux, and though it's not a kfunc
> > we can use it to annotate
> > (struct sched_ext_ops *)->dispatch() callback
> > with a particular KF_ flag
> > (or a set of flags for SCX_KF_RQ_LOCKED case).
> >
> > Then the verifier (while analyzing the program that is targeted
> > to be attach to this ->dispatch() hook)
> > will check this extra KF flag in st_ops
> > and will only allow to call kfuncs with matching flags:
> >
> > if (st_ops->kf_mask & kfunc->kf_mask) // ok to call kfunc from this callback
> >
> > The end result current->scx.kf_mask will be removed
> > and instead of run-time check it will become static verifier check.
>
> Shall we move some of these logics from verifier core to
> btf_kfunc_id_set.filter()? IIUC, this would avoid using extra
> KF_* bits. To make the filter functions more capable, we
> probably need to pass bpf_verifier_env into the filter() function.
Passing env is probably unnecessary,
but if save 'moff':
const struct btf_type *t,
const struct btf_member *member,
u32 moff = __btf_member_bit_offset(t, member) / 8;
after successful check_struct_ops_btf_id() somewhere in prog->aux
then btf_kfunc_id_set.filter() can indeed do
moff == offsetof(struct sched_ext_ops, dispatch)
allow kfuncs suitable for dispatch.
Powered by blists - more mailing lists