linux-kernel - Re: [RFC PATCH bpf-next 6/8] sched_ext: Add filter for scx_kfunc_ids

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <AM6PR03MB508083D0D2632436DCBEC0A399FE2@AM6PR03MB5080.eurprd03.prod.outlook.com>
Date: Fri, 14 Feb 2025 20:30:32 +0000
From: Juntong Deng <juntong.deng@...look.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>,
 John Fastabend <john.fastabend@...il.com>,
 Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau
 <martin.lau@...ux.dev>, Eddy Z <eddyz87@...il.com>,
 Song Liu <song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>,
 KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>,
 Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
 Kumar Kartikeya Dwivedi <memxor@...il.com>, Tejun Heo <tj@...nel.org>,
 David Vernet <void@...ifault.com>, Andrea Righi <arighi@...dia.com>,
 changwoo@...lia.com, bpf <bpf@...r.kernel.org>,
 LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH bpf-next 6/8] sched_ext: Add filter for
 scx_kfunc_ids_unlocked

On 2025/2/11 03:48, Alexei Starovoitov wrote:
> On Mon, Feb 10, 2025 at 3:40 PM Juntong Deng <juntong.deng@...look.com> wrote:
>>
>> On 2025/2/8 03:37, Alexei Starovoitov wrote:
>>> On Wed, Feb 5, 2025 at 11:35 AM Juntong Deng <juntong.deng@...look.com> wrote:
>>>>
>>>> This patch adds filter for scx_kfunc_ids_unlocked.
>>>>
>>>> The kfuncs in the scx_kfunc_ids_unlocked set can be used in init, exit,
>>>> cpu_online, cpu_offline, init_task, dump, cgroup_init, cgroup_exit,
>>>> cgroup_prep_move, cgroup_cancel_move, cgroup_move, cgroup_set_weight
>>>> operations.
>>>>
>>>> Signed-off-by: Juntong Deng <juntong.deng@...look.com>
>>>> ---
>>>>    kernel/sched/ext.c | 30 ++++++++++++++++++++++++++++++
>>>>    1 file changed, 30 insertions(+)
>>>>
>>>> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
>>>> index 7f039a32f137..955fb0f5fc5e 100644
>>>> --- a/kernel/sched/ext.c
>>>> +++ b/kernel/sched/ext.c
>>>> @@ -7079,9 +7079,39 @@ BTF_ID_FLAGS(func, scx_bpf_dispatch_from_dsq, KF_RCU)
>>>>    BTF_ID_FLAGS(func, scx_bpf_dispatch_vtime_from_dsq, KF_RCU)
>>>>    BTF_KFUNCS_END(scx_kfunc_ids_unlocked)
>>>>
>>>> +static int scx_kfunc_ids_unlocked_filter(const struct bpf_prog *prog, u32 kfunc_id)
>>>> +{
>>>> +       u32 moff;
>>>> +
>>>> +       if (!btf_id_set8_contains(&scx_kfunc_ids_unlocked, kfunc_id) ||
>>>> +           prog->aux->st_ops != &bpf_sched_ext_ops)
>>>> +               return 0;
>>>> +
>>>> +       moff = prog->aux->attach_st_ops_member_off;
>>>> +       if (moff == offsetof(struct sched_ext_ops, init) ||
>>>> +           moff == offsetof(struct sched_ext_ops, exit) ||
>>>> +           moff == offsetof(struct sched_ext_ops, cpu_online) ||
>>>> +           moff == offsetof(struct sched_ext_ops, cpu_offline) ||
>>>> +           moff == offsetof(struct sched_ext_ops, init_task) ||
>>>> +           moff == offsetof(struct sched_ext_ops, dump))
>>>> +               return 0;
>>>> +
>>>> +#ifdef CONFIG_EXT_GROUP_SCHED
>>>> +       if (moff == offsetof(struct sched_ext_ops, cgroup_init) ||
>>>> +           moff == offsetof(struct sched_ext_ops, cgroup_exit) ||
>>>> +           moff == offsetof(struct sched_ext_ops, cgroup_prep_move) ||
>>>> +           moff == offsetof(struct sched_ext_ops, cgroup_cancel_move) ||
>>>> +           moff == offsetof(struct sched_ext_ops, cgroup_move) ||
>>>> +           moff == offsetof(struct sched_ext_ops, cgroup_set_weight))
>>>> +               return 0;
>>>> +#endif
>>>> +       return -EACCES;
>>>> +}
>>>> +
>>>>    static const struct btf_kfunc_id_set scx_kfunc_set_unlocked = {
>>>>           .owner                  = THIS_MODULE,
>>>>           .set                    = &scx_kfunc_ids_unlocked,
>>>> +       .filter                 = scx_kfunc_ids_unlocked_filter,
>>>>    };
>>>
>>> why does sched-ext use so many id_set-s ?
>>>
>>>           if ((ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
>>>                                                &scx_kfunc_set_select_cpu)) ||
>>>               (ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
>>>
>>> &scx_kfunc_set_enqueue_dispatch)) ||
>>>               (ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
>>>                                                &scx_kfunc_set_dispatch)) ||
>>>               (ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
>>>                                                &scx_kfunc_set_cpu_release)) ||
>>>               (ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
>>>                                                &scx_kfunc_set_unlocked)) ||
>>>
>>> Can they all be rolled into one id_set then
>>> the patches 2-6 will be collapsed into one patch and
>>> one filter callback that will describe allowed hook/kfunc combinations?
>>
>> Yes, I agree that it would be ideal to put all kfuncs in the one id_set,
>> but I am not sure that this is better in implementation.
>>
>> For filters, the only kfunc-related information that can be known is
>> the kfunc_id.
>>
>> kfunc_id is not a stable value, for example, when we add a new kfunc to
>> the kernel, it may cause the kfunc_id of other kfuncs to change.
>>
>> A simple experiment is to add a bpf_task_from_aaa kfunc, and then we
>> will find that the kfunc_id of bpf_task_from_pid has changed.
>>
>> This means that it is simple for us to implement kfuncs grouping via
>> id_set because we only need to check if kfunc_id exists in a specific
>> id_set, we do not need to care about what kfunc_id is.
>>
>> But if we implement grouping only in the filter, we may need to first
>> get the btf type of the corresponding kfunc based on the kfunc_id via
>> btf_type_by_id, and then further get the kfunc name, and then group
>> based on the kfunc name in the filter, which seems more complicated.
> 
> I didn't mean to extract kfunc name as a string and do strcmp() on it.
> That's a non-starter.
> I imagined verifier-like approach of enum+set+list
> where enum has all kfunc names,
> set gives efficient btf_id_set8_contains() access,
> and list[KF_bpf_foo] gives func_id to compare with.
> 
> But if the current break down of scx_kfunc_set_* fits well
> with per struct_ops hook filtering then keep it.
> But please think of a set approach for moff as well to avoid
> +           moff == offsetof(struct sched_ext_ops, exit) ||
> +           moff == offsetof(struct sched_ext_ops, cpu_online) ||
> +           moff == offsetof(struct sched_ext_ops, cpu_offline) ||
> 
> Then it will be:
> if (btf_id_set8_contains(&scx_kfunc_ids_unlocked, kfunc_id) ...
> && moff_set_containts(.._unlocked, moff)) // allow
> 
> There is SCX_OP_IDX(). Maybe it can be used to populate a set.
> 
> Something like this:
> static const u32 ops_flags[] = {
>    [SCX_OP_IDX(cpu_online)] = KF_UNLOCKED,
>    ..
> };
> 
> if (btf_id_set8_contains(&scx_kfunc_ids_unlocked, kfunc_id) &&
>      (ops_flags[moff / sizeof(void (*)(void))] & KF_UNLOCKED)) // allow

Thanks for letting me know this method.

This is a good method.

I have used it in version 2 [0].

Also, I figured out a way to require only one filter.

[0]: 
https://lore.kernel.org/bpf/AM6PR03MB5080855B90C3FE9B6C4243B099FE2@AM6PR03MB5080.eurprd03.prod.outlook.com/T/#u