lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1763457705.git.zhuhui@kylinos.cn>
Date: Wed, 19 Nov 2025 09:34:05 +0800
From: Hui Zhu <hui.zhu@...ux.dev>
To: Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Hocko <mhocko@...nel.org>,
	Roman Gushchin <roman.gushchin@...ux.dev>,
	Shakeel Butt <shakeel.butt@...ux.dev>,
	Muchun Song <muchun.song@...ux.dev>,
	Alexei Starovoitov <ast@...nel.org>,
	Daniel Borkmann <daniel@...earbox.net>,
	Andrii Nakryiko <andrii@...nel.org>,
	Martin KaFai Lau <martin.lau@...ux.dev>,
	Eduard Zingerman <eddyz87@...il.com>,
	Song Liu <song@...nel.org>,
	Yonghong Song <yonghong.song@...ux.dev>,
	John Fastabend <john.fastabend@...il.com>,
	KP Singh <kpsingh@...nel.org>,
	Stanislav Fomichev <sdf@...ichev.me>,
	Hao Luo <haoluo@...gle.com>,
	Jiri Olsa <jolsa@...nel.org>,
	Shuah Khan <shuah@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Miguel Ojeda <ojeda@...nel.org>,
	Nathan Chancellor <nathan@...nel.org>,
	Kees Cook <kees@...nel.org>,
	Tejun Heo <tj@...nel.org>,
	Jeff Xu <jeffxu@...omium.org>,
	mkoutny@...e.com,
	Jan Hendrik Farr <kernel@...rr.cc>,
	Christian Brauner <brauner@...nel.org>,
	Randy Dunlap <rdunlap@...radead.org>,
	Brian Gerst <brgerst@...il.com>,
	Masahiro Yamada <masahiroy@...nel.org>,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	cgroups@...r.kernel.org,
	bpf@...r.kernel.org,
	linux-kselftest@...r.kernel.org
Cc: Hui Zhu <zhuhui@...inos.cn>
Subject: [RFC PATCH 0/3] Memory Controller eBPF support

From: Hui Zhu <zhuhui@...inos.cn>

This series proposes adding eBPF support to the Linux memory
controller, enabling dynamic and extensible memory management
policies at runtime.

Background

The memory controller (memcg) currently provides fixed memory
accounting and reclamation policies through static kernel code.
This limits flexibility for specialized workloads and use cases
that require custom memory management strategies.

By enabling eBPF programs to hook into key memory control
operations, administrators can implement custom policies without
recompiling the kernel, while maintaining the safety guarantees
provided by the BPF verifier.

Use Cases

1. Custom memory reclamation strategies for specialized workloads
2. Dynamic memory pressure monitoring and telemetry
3. Memory accounting adjustments based on runtime conditions
4. Integration with container orchestration systems for
   intelligent resource management
5. Research and experimentation with novel memory management
   algorithms

Design Overview

This series introduces:

1. A new BPF struct ops type (`memcg_ops`) that allows eBPF
   programs to implement custom behavior for memory charging
   operations.

2. A hook point in the `try_charge_memcg()` fast path that
   invokes registered eBPF programs to determine if custom
   memory management should be applied.

3. The eBPF handler can inspect memory cgroup context and
   optionally modify certain parameters (e.g., `nr_pages` for
   reclamation size).

4. A reference counting mechanism using `percpu_ref` to safely
   manage the lifecycle of registered eBPF struct ops instances.

5. Configuration via `CONFIG_MEMCG_BPF` to allow disabling this
   feature at build time.

Implementation Details

- Uses BPF struct ops for a cleaner integration model
- Leverages static branch keys for minimal overhead when feature
  is unused
- RCU synchronization ensures safe replacement of handlers
- Sample eBPF program demonstrates monitoring capabilities
- Comprehensive selftest suite validates core functionality

Performance Considerations

- Zero overhead when feature is disabled or no eBPF program is
  loaded (static branch is disabled)
- Minimal overhead when enabled: one indirect function call per
  charge attempt
- eBPF programs run under the restrictions of the BPF verifier

Patch Overview

PATCH 1/3: Core kernel implementation
  - Adds eBPF struct ops support to memcg
  - Introduces CONFIG_MEMCG_BPF option
  - Implements safe registration/unregistration mechanism

PATCH 2/3: Selftest suite
  - prog_tests/memcg_ops.c: Test entry points
  - progs/memcg_ops.bpf.c: Test eBPF program
  - Validates load, attach, and single-handler constraints

PATCH 3/3: Sample userspace program
  - samples/bpf/memcg_printk.bpf.c: Monitoring eBPF program
  - samples/bpf/memcg_printk.c: Userspace loader
  - Demonstrates real-world usage and debugging capabilities

Open Questions & Discussion Points

1. Should the eBPF handler have access to additional memory
   cgroup state? Current design exposes minimal context to
   reduce attack surface.

2. Are there other memory control operations that would benefit
   from eBPF extensibility (e.g., uncharge, reclaim)?

3. Should there be permission checks or restrictions on who can
   load memcg eBPF programs? Currently inherits BPF's
   CAP_PERFMON/CAP_SYS_ADMIN requirements.

4. How should we handle multiple eBPF programs trying to
   register? Current implementation allows only one active
   handler.

5. Is the current exposed context in `try_charge_memcg` struct
   sufficient, or should additional fields be added?

Testing

The selftests provide comprehensive coverage of the core
functionality. The sample program can be used for manual
testing and as a reference for implementing additional
monitoring tools.

Hui Zhu (3):
  memcg: add eBPF struct ops support for memory charging
  selftests/bpf: add memcg eBPF struct ops test
  samples/bpf: add example memcg eBPF program

 MAINTAINERS                                   |   5 +
 init/Kconfig                                  |  38 ++++
 mm/Makefile                                   |   1 +
 mm/memcontrol.c                               |  26 ++-
 mm/memcontrol_bpf.c                           | 200 ++++++++++++++++++
 mm/memcontrol_bpf.h                           | 103 +++++++++
 samples/bpf/Makefile                          |   2 +
 samples/bpf/memcg_printk.bpf.c                |  30 +++
 samples/bpf/memcg_printk.c                    |  82 +++++++
 .../selftests/bpf/prog_tests/memcg_ops.c      | 117 ++++++++++
 tools/testing/selftests/bpf/progs/memcg_ops.c |  20 ++
 11 files changed, 617 insertions(+), 7 deletions(-)
 create mode 100644 mm/memcontrol_bpf.c
 create mode 100644 mm/memcontrol_bpf.h
 create mode 100644 samples/bpf/memcg_printk.bpf.c
 create mode 100644 samples/bpf/memcg_printk.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/memcg_ops.c
 create mode 100644 tools/testing/selftests/bpf/progs/memcg_ops.c

-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ