linux-kernel - [RFC PATCH bpf-next 0/7] bpf: BPF internal fine-grained permission management (BPF internal capabilities)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <AM6PR03MB5080C05323552276324C4B4C991A2@AM6PR03MB5080.eurprd03.prod.outlook.com>
Date: Thu, 16 Jan 2025 19:35:18 +0000
From: Juntong Deng <juntong.deng@...look.com>
To: ast@...nel.org,
	daniel@...earbox.net,
	john.fastabend@...il.com,
	andrii@...nel.org,
	martin.lau@...ux.dev,
	eddyz87@...il.com,
	song@...nel.org,
	yonghong.song@...ux.dev,
	kpsingh@...nel.org,
	sdf@...ichev.me,
	haoluo@...gle.com,
	jolsa@...nel.org,
	memxor@...il.com,
	tj@...nel.org,
	void@...ifault.com
Cc: bpf@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: [RFC PATCH bpf-next 0/7] bpf: BPF internal fine-grained permission management (BPF internal capabilities)

Overview
--------

This is a proof-of-concept patch series that aims to rethink the current
permission management of bpf programs.

This patch series is used to demonstrate the idea of BPF (internal)
capabilities (fine-grained permissions model) to solve the problems
caused by the coarse-grained permissions model based on program type
in the current BPF.

In this patch series, I consider what BPF kfuncs a bpf program can use,
what BPF helpers it can use, what BPF maps it can use, etc. as
permissions of the bpf program.

Note that the "capabilities" mentioned in this patch series have nothing
to do with Linux capabilities, nor with userspace.

The BPF capabilities in this patch series are capabilities that are ONLY
used internally in the bpf subsystem.

The ideas in this patch series come from previous discussions [0].

[0]: https://lore.kernel.org/bpf/AM6PR03MB5080DC63013560E26507079E99042@AM6PR03MB5080.eurprd03.prod.outlook.com/T/#t

Motivation
----------

Currently, the permission management of bpf programs is a coarse-grained
model based on program types. The program type determines the
permissions of the bpf program. 

This is fine when BPF has fewer usage scenarios, but it becomes
inappropriate when BPF has more usage scenarios.

The following are the current problems:

1. Cannot change the permissions of bpf program in different contexts

Since permissions management in BPF is based on program type, once a
bpf program selects a program type, its permissions cannot be changed.

Currently sched-ext (SCX) is implemented based on the
BPF_PROG_TYPE_STRUCT_OPS program type, but SCX needs to enforce
different restrictions in different contexts. For example, some kfuncs
can only be used in the DISPATCH context, and some kfuncs can only be
used in the CPU_RELEASE context.

However, the current BPF permission management based on program type
cannot natively implement these restrictions. The current approach used
by SCX is dynamic detection, by adding masks to check at runtime if
disallowed kfuncs are being called, which results in runtime overhead.

Ideally, we could check for these incorrect uses of kfuncs via the
verifier without any runtime overhead.

2. Permission rules cannot be inherited and extended between program types

When one program type has a large number of the same base permissions as
another program type, the current permission model based on program
types cannot achieve "inheritance".

All kfuncs need to be registered to each program type separately and
populated into struct btf_id_set8 of each program type via
btf_populate_kfunc_set.

The current feature similar to "inheritance" is "alias".
BPF_PROG_TYPE_TRACING, BPF_PROG_TYPE_TRACEPOINT,
BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_LSM are actually "aliases"
of BTF_KFUNC_HOOK_TRACING.

So what should we do if there are differences in permissions between
"aliased" program types? We need to implement a filter callback function
to filter out some commonly registered kfuncs under different specific
program types.

This is obviously not an elegant solution.


The essence of all the above problems comes from the fact that the
current coarse-grained bpf permission model based on program type is
no longer appropriate and we need to rethink it.

What we need to face is:

1. ONE bpf program type can be used in MANY different contexts
(scenarios), and these contexts may have different restrictions.

2. There will be more bpf program types, and there will be a lot of
common permissions between different program types.

When faced with complex permission management, we need a fine-grained
permission management model. It is difficult for us to achieve fine-
grained permission division based on a coarse-grained permission model.

The current SCX mask and filter callback functions are band-aids for
this coarse-grained permission model.

BPF Capabilities
----------------

BPF capabilities is a capability-based permission model used internally
in the BPF subsystem. In BPF capabilities, all kfuncs will be registered
into different capabilities according to fine-grained permission
division, rather than directly registered into the program type.

BPF capabilities aims to achieve is:

1. Fine-grained permission division

All kfuncs can be divided into different sets according to their
functions and registered to different capabilities, such as
BPF_CAP_FS, BPF_CAP_LSM, BPF_CAP_SCX_DISPATCH. 

In this way, we can enable or disable some features in
different contexts.

2. Dynamically enable and disable capabilities

The bpf verifier maintains a list of capabilities that are
currently enabled for the bpf program. This list can be modified in
different contexts. 

When a bpf program accesses a feature corresponding to an enabled
capability, it will be allowed, but if it accesses a feature
corresponding to a disabled capability, it will be denied.

3. Capabilities hierarchy

Capabilities can be organized in a hierarchy. For example, we can
define TRACING_CAP_BASE, which includes all common capabilities in
tracing scenarios and can be used in BPF_PROG_TYPE_TRACING,
BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT,
and BPF_PROG_TYPE_LSM.

We do not need to list all required capabilities separately for
each program type.

4. Low-coupling capabilities system:

Different subsystems can define their own capabilities and change the
capabilities of a bpf program (enable or disable) in the verifier in
different contexts in a appropriate way.

All of this does not require modifications to the BPF core and needs
to be decoupled from the BPF core.

Proof of Concept Alert
----------------------

Note that this is a proof-of-concept in the early stages and all code
in this patch series is not well-designed.

This is a minimal proof-of-concept used only to demonstrate the idea,
and the code is full of bugs and bits and pieces here and there,
please don't mind.

Current Implementation
----------------------

The implementation in this patch series is a possible way to implement
BPF capabilities. We can discuss other better implementations of
BPF capabilities.

1. Fine-grained permission division

I added a new field "capability" in BTF_ID to record the capability of
each kfuncs. This field will be set when registering the kfuncs sets. 

All kfuncs will be put into the same struct btf_id_set8, and will no
longer be divided into different sets according to program type.

All permission managements are based on capabilities, not program types.

2. Dynamically enable and disable capabilities

I added a bitmap "bpf_capabilities" to struct bpf_verifier_env to record
the capabilities currently enabled for the bpf program.

This bitmap can be changed in different contexts. In check_kfunc_call,
the bitmap is used to determine whether the kfunc call is legal.

3. Capabilities hierarchy

I used macros to define sets of base capabilities, such as
STRUCT_OPS_BASE_CAPS.

The default enabled capabilities for each program type are defined via
array, which can contain base capabilities macros.

4. Low-coupling capabilities system:

I added the bpf_capabilities_adjust callback function to
struct bpf_verifier_ops and the context information context_info
to struct bpf_verifier_env (in the case of SCX, this context
information may be "moff").

Passing context_info to the bpf_capabilities_adjust callback function
allows the implementer to determine the current context and make changes
to the enabled capabilities list of the bpf program in the verifier.

Test Results
------------

For testing I added scx_simple_cap_test. I added
scx_bpf_dsq_move_to_local to enqueue, which is not allowed.
If we run this program, the verifier will report errors.

./build/bin/scx_simple_cap_test 
libbpf: prog 'simple_enqueue': BPF program load failed: -EACCES
libbpf: prog 'simple_enqueue': -- BEGIN PROG LOAD LOG --
...
17: (85) call scx_bpf_dsq_move_to_local#135437
The bpf program does not have the capability to call scx_bpf_dsq_move_to_local
...
libbpf: failed to load BPF skeleton 'scx_simple_cap_test': -EACCES
[SCX_BUG] scx_simple_cap_test.c:88 (Permission denied)
Failed to load skel

But if we run scx_simple, the program can run normally.

./build/bin/scx_simple
[  152.792015] sched_ext: BPF scheduler "simple" enabled
local=7 global=0
local=30 global=3
local=33 global=11

More
----

BPF capabilities is a general function that is flexible and extensible. 

In my opinion, bpf capabilities can be used not only to manage kfuncs,
but can be used to manage permissions for all features of BPF, including
BPF helpers, BPF maps, etc.

We can associate these features with a capability, so that the bpf
verifier can manage them according to different contexts.

Maybe we can also make BPF capabilities configurable through /sys/bpf
or associate some BPF capabilities with Linux capabilities, so that
system administrators can choose to only open part of BPF features
to certain users.

Related Suggestions
-------------------

In the current implementation, I need to add capability information to
each kfuncs, this is implemented by modifying the BTF_ID structure.

But I cannot modify BTF_ID directly, because BTF_ID is used for data
structures in addition to kfuncs, and data structures do not need
capability information.

My suggestion is to use BTF_ID_FLAGS for all kfuncs and only use BTF_ID
for data structures.

This way we can distinguish kfuncs from data structures.

At The End
----------

This is a proof-of-concept patch series that rethinks the current BPF
permissions management.

All ideas and implementations are not complete yet, but BPF capabilities
may be a better solution than the current program type-based
permission management.

Welcome to discuss and give feedback!

Many thanks.

Signed-off-by: Juntong Deng <juntong.deng@...look.com>

Juntong Deng (7):
  bpf: Add capability field to BTF_ID_FLAGS
  bpf: Add enum bpf_capability
  bpf: Add capabilities version of kfuncs registration
  bpf: Make the verifier support BPF capabilities
  bpf: Add default BPF capabilities initialization for program types
  sched_ext: Make SCX use BPF capabilities
  sched_ext: Add proof-of-concept test case

 include/linux/bpf.h                       |   2 +
 include/linux/bpf_verifier.h              |   6 +
 include/linux/btf.h                       |   8 +-
 include/linux/btf_ids.h                   |   6 +-
 include/uapi/linux/bpf.h                  |  15 ++
 kernel/bpf/btf.c                          | 165 +++++++++++++++++++++-
 kernel/bpf/verifier.c                     |  66 ++++++++-
 kernel/sched/ext.c                        |  74 ++++++++--
 tools/bpf/resolve_btfids/main.c           |   2 +-
 tools/include/linux/btf_ids.h             |   1 +
 tools/sched_ext/Makefile                  |   2 +-
 tools/sched_ext/scx_simple_cap_test.bpf.c | 159 +++++++++++++++++++++
 tools/sched_ext/scx_simple_cap_test.c     | 107 ++++++++++++++
 13 files changed, 590 insertions(+), 23 deletions(-)
 create mode 100644 tools/sched_ext/scx_simple_cap_test.bpf.c
 create mode 100644 tools/sched_ext/scx_simple_cap_test.c

-- 
2.39.5