linux-kernel - Re: per st_ops kfunc allow/deny mask. Was: [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <AM6PR03MB5080A4F12672D91CC52345AA991A2@AM6PR03MB5080.eurprd03.prod.outlook.com>
Date: Thu, 16 Jan 2025 19:52:36 +0000
From: Juntong Deng <juntong.deng@...look.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>,
 Tejun Heo <tj@...nel.org>
Cc: Alexei Starovoitov <ast@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>,
 John Fastabend <john.fastabend@...il.com>,
 Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau
 <martin.lau@...ux.dev>, Eddy Z <eddyz87@...il.com>,
 Song Liu <song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>,
 KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>,
 Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
 Kumar Kartikeya Dwivedi <memxor@...il.com>, snorcht@...il.com,
 Christian Brauner <brauner@...nel.org>, bpf <bpf@...r.kernel.org>,
 LKML <linux-kernel@...r.kernel.org>,
 Linux-Fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: per st_ops kfunc allow/deny mask. Was: [PATCH bpf-next v6 4/5]
 bpf: Make fs kfuncs available for SYSCALL program type

On 2025/1/10 20:42, Juntong Deng wrote:
> On 2025/1/9 19:23, Alexei Starovoitov wrote:
>> On Mon, Dec 23, 2024 at 4:51 PM Juntong Deng 
>> <juntong.deng@...look.com> wrote:
>>>
>>>>
>>>> The main goal is to get rid of run-time mask check in SCX_CALL_OP() and
>>>> make it static by the verifier. To make that happen scx_kf_mask flags
>>>> would need to become KF_* flags while each struct-ops callback will
>>>> specify the expected mask.
>>>> Then at struct-ops prog attach time the verifier will see the 
>>>> expected mask
>>>> and can check that all kfuncs calls of this particular program
>>>> satisfy the mask. Then all of the runtime overhead of
>>>> current->scx.kf_mask and scx_kf_allowed() will go away.
>>>
>>> Thanks for pointing this out.
>>>
>>> Yes, I am interested in working on it.
>>>
>>> I will try to solve this problem in a separate patch series.
>>>
>>>
>>> The following are my thoughts:
>>>
>>> Should we really use KF_* to do this? I think KF_* is currently more
>>> like declaring that a kfunc has some kind of attribute, e.g.
>>> KF_TRUSTED_ARGS means that the kfunc only accepts trusted arguments,
>>> rather than being used to categorise kfuncs.
>>>
>>> It is not sustainable to restrict the kfuncs that can be used based on
>>> program types, which are coarse-grained. This problem will get worse
>>> as kfuncs increase.
>>>
>>> In my opinion, managing the kfuncs available to bpf programs should be
>>> implemented as capabilities. Capabilities are a mature permission model.
>>> We can treat a set of kfuncs as a capability (like the various current
>>> kfunc_sets, but the current kfunc_sets did not carefully divide
>>> permissions).
>>>
>>> We should use separate BPF_CAP_XXX flags to manage these capabilities.
>>> For example, SCX may define BPF_CAP_SCX_DISPATCH.
>>>
>>> For program types, we should divide them into two levels, types and
>>> subtypes. Types are used to register common capabilities and subtypes
>>> are used to register specific capabilities. The verifier can check if
>>> the used kfuncs are allowed based on the type and subtype of the bpf
>>> program.
>>>
>>> I understand that we need to maintain backward compatibility to
>>> userspace, but capabilities are internal changes in the kernel.
>>> Perhaps we can make the current program types as subtypes and
>>> add 'types' that are only used internally, and more subtypes
>>> (program types) can be added in the future.
>>
>> Sorry for the delay.
>> imo CAP* approach doesn't fit.
>> caps are security bits exposed to user space.
>> Here there is no need to expose anything to user space.
>>
>> But you're also correct that we cannot extend kfunc KF_* flags
>> that easily. KF_* flags are limited to 32-bit and we're already
>> using 12 bits.
>> enum scx_kf_mask needs 5 bits, so we can squeeze them into
>> the current 32-bit field _for now_,
>> but eventually we'd need to refactor kfunc definition into a wider set:
>> BTF_ID_FLAGS(func, .. KF_*)
>> so that different struct_ops consumers can define their own bits.
>>
>> Right now SCX is the only st_ops consumer who needs this feature,
>> so let's squeeze into the existing KF facility.
>>
>> First step is to remap scx_kf_mask bits into unused bits in KF_
>> and annotate corresponding sched-ext kfuncs with it.
>> For example:
>> SCX_KF_DISPATCH will become
>> KF_DISPATCH (1 << 13)
>>
>> and all kfuncs that are allowed to be called from ->dispatch() callback
>> will be annotated like:
>> - BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
>> - BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots)
>> - BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel)
>> + BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
>> + BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots, KF_DISPATCH)
>> + BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel, KF_DISPATCH)
>>
>>
>> For sched_ext_ops callback annotations, I think,
>> the simplest approach is to add special
>> BTF_SET8_START(st_ops_flags)
>> BTF_ID_FLAGS(func, sched_ext_ops__dispatch, KF_DISPATCH)
>> and so on for other ops stubs.
>>
>> sched_ext_ops__dispatch() is an empty function that
>> exists in the vmlinux, and though it's not a kfunc
>> we can use it to annotate
>> (struct sched_ext_ops *)->dispatch() callback
>> with a particular KF_ flag
>> (or a set of flags for SCX_KF_RQ_LOCKED case).
>>
>> Then the verifier (while analyzing the program that is targeted
>> to be attach to this ->dispatch() hook)
>> will check this extra KF flag in st_ops
>> and will only allow to call kfuncs with matching flags:
>>
>> if (st_ops->kf_mask & kfunc->kf_mask) // ok to call kfunc from this 
>> callback
>>
>> The end result current->scx.kf_mask will be removed
>> and instead of run-time check it will become static verifier check.
> 
> Sorry, I may not have explained my idea clearly.
> 
> The "capabilities" I mentioned have nothing to do with userspace.
> The "capabilities" I mentioned are conceptual, not referring to the
> capabilities in the Linux.
> 
> My idea is that a similar "capabilities" mechanism should be used
> inside the BPF subsystem (separate).
> 
> I think the essence of the problem is that ONE bpf program type can
> be used in MANY different contexts, but different contexts can have
> different restrictions.
> 
> It is reasonable for one bpf program type to be used in different
> contexts. There is no need for one bpf program type to be used
> in only one context.
> 
> But currently the "permission" management of the BPF subsystem is
> completely based on the bpf program type, which is a coarse-grained
> model (for example, what kfuncs are allowed to be used, which can
> be considered as permissions).
> 
> As BPF is used in more and more scenarios, and as one bpf program type
> is used in more and more different scenarios, the coarse-grained problem
> starts to emerge. It is difficult to divide permissions in different
> contexts based on a coarse-grained permission model.
> 
> This is why I said that the BPF subsystem should have its own
> "capabilities" (again, not part of Linux capabilities, and nothing
> to do with userspace).
> 
> In my opinion, we should separate permission management from bpf program
> types. We need an extra layer of abstraction so that we can achieve
> fine-grained permission management.
> 
> The reason why I have the idea of capabilities is because in my opinion,
> bpf programs need application-like permissions management in a sense,
> because BPF is generic.
> 
> When BPF is applied to other subsystems (e.g. scheduling, security,
> accessing information from other subsystems), we need something like
> capabilities. Each subsystem can define its own set of bpf capabilities
> to restrict the features that can be used by bpf programs in different
> contexts, so that bpf programs can only use a subset of features.
> 
> Another advantage of this approach is that bpf capabilities do not need
> to be tightly placed inside the bpf core, people in other subsystems can
> define them externally and add bpf capabilities they need (Adding
> KF_FLAGS can be considered as modifying the bpf core, right?).
> 
> Of course, maybe one day in the future, we may be able to associate bpf
> capabilities with Linux capabilities, maybe system administrators can
> choose to open only some of bpf features to certain users, and maybe all
> of this can be configured through /sys/bpf.
> 
> So, how do we implement this in the verifier? I think registering the
> bpf capabilities is not an problem, it is consistent with the current
> registration of kfuncs to the bpf program type, we still use
> struct btf_kfunc_id_set.
> 
> The really interesting part is how we allow people from different
> subsystems to change the capabilities of the bpf program in different
> contexts under the same bpf program type. My idea is to add a new
> callback function in struct bpf_verifier_ops, say bpf_ctx_allowed_cap.
> We can pass context information to this callback function, and the
> person who implements this callback function can decide to adjust the
> current capabilities (add or delete) in different contexts. In the
> case of bpf_struct_ops, the context information may be "moff".
> 
> In my opinion, capabilities are a flexible and extensible approach.
> All it takes is adding a layer of abstraction to decouple permissions
> from program types.
> 
> Of course, there are more other technical details that need to be
> figured out, and if you think the bpf capabilities is an interesting
> idea worth trying, I will try to write a minimal POC and send an
> RFC PATCH.
> 
> (Actually I have always wanted to write a POC but in the last two weeks
> I have been busy with my university stuff and really didn't have the
> time, but now I finally have some time to try it)
> 
> Yes, maybe I am a bit too forward thinking, at the moment only SCX has
> hit this "wall", and at the moment we can indeed solve it directly with
> KF_FLAGS or filters.
> 
> But I think the problem will persist and come up again in other
> scenarios, and maybe we can try something new and maybe not just
> solve the SCX problem.
> 
> Does this make sense?
> 
> Maybe we can have more discussion.
> 
> Many thanks.

Hello, I sent a proof-of-concept patch series of BPF capabilities [0].

[0]: 
https://lore.kernel.org/bpf/AM6PR03MB5080C05323552276324C4B4C991A2@AM6PR03MB5080.eurprd03.prod.outlook.com/T/#t