linux-kernel - Re: [RFC PATCH bpf-next 6/8] sched_ext: Add filter for scx_kfunc_ids

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z6jIS91qpNYtvRXr@gpd3>
Date: Sun, 9 Feb 2025 16:22:51 +0100
From: Andrea Righi <arighi@...dia.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Juntong Deng <juntong.deng@...look.com>,
	Alexei Starovoitov <ast@...nel.org>,
	Daniel Borkmann <daniel@...earbox.net>,
	John Fastabend <john.fastabend@...il.com>,
	Andrii Nakryiko <andrii@...nel.org>,
	Martin KaFai Lau <martin.lau@...ux.dev>, Eddy Z <eddyz87@...il.com>,
	Song Liu <song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>,
	KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>,
	Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
	Kumar Kartikeya Dwivedi <memxor@...il.com>,
	Tejun Heo <tj@...nel.org>, David Vernet <void@...ifault.com>,
	changwoo@...lia.com, bpf <bpf@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH bpf-next 6/8] sched_ext: Add filter for
 scx_kfunc_ids_unlocked

On Fri, Feb 07, 2025 at 07:37:51PM -0800, Alexei Starovoitov wrote:
> On Wed, Feb 5, 2025 at 11:35 AM Juntong Deng <juntong.deng@...look.com> wrote:
> >
> > This patch adds filter for scx_kfunc_ids_unlocked.
> >
> > The kfuncs in the scx_kfunc_ids_unlocked set can be used in init, exit,
> > cpu_online, cpu_offline, init_task, dump, cgroup_init, cgroup_exit,
> > cgroup_prep_move, cgroup_cancel_move, cgroup_move, cgroup_set_weight
> > operations.
> >
> > Signed-off-by: Juntong Deng <juntong.deng@...look.com>
> > ---
> >  kernel/sched/ext.c | 30 ++++++++++++++++++++++++++++++
> >  1 file changed, 30 insertions(+)
> >
> > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> > index 7f039a32f137..955fb0f5fc5e 100644
> > --- a/kernel/sched/ext.c
> > +++ b/kernel/sched/ext.c
> > @@ -7079,9 +7079,39 @@ BTF_ID_FLAGS(func, scx_bpf_dispatch_from_dsq, KF_RCU)
> >  BTF_ID_FLAGS(func, scx_bpf_dispatch_vtime_from_dsq, KF_RCU)
> >  BTF_KFUNCS_END(scx_kfunc_ids_unlocked)
> >
> > +static int scx_kfunc_ids_unlocked_filter(const struct bpf_prog *prog, u32 kfunc_id)
> > +{
> > +       u32 moff;
> > +
> > +       if (!btf_id_set8_contains(&scx_kfunc_ids_unlocked, kfunc_id) ||
> > +           prog->aux->st_ops != &bpf_sched_ext_ops)
> > +               return 0;
> > +
> > +       moff = prog->aux->attach_st_ops_member_off;
> > +       if (moff == offsetof(struct sched_ext_ops, init) ||
> > +           moff == offsetof(struct sched_ext_ops, exit) ||
> > +           moff == offsetof(struct sched_ext_ops, cpu_online) ||
> > +           moff == offsetof(struct sched_ext_ops, cpu_offline) ||
> > +           moff == offsetof(struct sched_ext_ops, init_task) ||
> > +           moff == offsetof(struct sched_ext_ops, dump))
> > +               return 0;
> > +
> > +#ifdef CONFIG_EXT_GROUP_SCHED
> > +       if (moff == offsetof(struct sched_ext_ops, cgroup_init) ||
> > +           moff == offsetof(struct sched_ext_ops, cgroup_exit) ||
> > +           moff == offsetof(struct sched_ext_ops, cgroup_prep_move) ||
> > +           moff == offsetof(struct sched_ext_ops, cgroup_cancel_move) ||
> > +           moff == offsetof(struct sched_ext_ops, cgroup_move) ||
> > +           moff == offsetof(struct sched_ext_ops, cgroup_set_weight))
> > +               return 0;
> > +#endif
> > +       return -EACCES;
> > +}
> > +
> >  static const struct btf_kfunc_id_set scx_kfunc_set_unlocked = {
> >         .owner                  = THIS_MODULE,
> >         .set                    = &scx_kfunc_ids_unlocked,
> > +       .filter                 = scx_kfunc_ids_unlocked_filter,
> >  };
> 
> why does sched-ext use so many id_set-s ?
> 
>         if ((ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
>                                              &scx_kfunc_set_select_cpu)) ||
>             (ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
> 
> &scx_kfunc_set_enqueue_dispatch)) ||
>             (ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
>                                              &scx_kfunc_set_dispatch)) ||
>             (ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
>                                              &scx_kfunc_set_cpu_release)) ||
>             (ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
>                                              &scx_kfunc_set_unlocked)) ||
> 
> Can they all be rolled into one id_set then
> the patches 2-6 will be collapsed into one patch and
> one filter callback that will describe allowed hook/kfunc combinations?

I think the idea was to group them in different sets based on their context
usage, like scx_kfunc_set_select_cpu kfuncs can be used only from
ops.select_cpu(), scx_kfunc_set_dispatch kfuncs can be used only from
ops.dispatch(), etc.

However, since the actual context enforcement is done by scx_kf_allowed(),
it seems that we could have just 3 sets to classify the kfuncs based by
their prog type:
 1) BPF_PROG_TYPE_STRUCT_OPS
 2) BPF_PROG_TYPE_STRUCT_OPS + BPF_PROG_TYPE_SYSCALL
 3) BPF_PROG_TYPE_STRUCT_OPS + BPF_PROG_TYPE_SYSCALL + BPF_PROG_TYPE_TRACING

-Andrea