[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250425214039.2919818-1-ameryhung@gmail.com>
Date: Fri, 25 Apr 2025 14:40:32 -0700
From: Amery Hung <ameryhung@...il.com>
To: bpf@...r.kernel.org
Cc: netdev@...r.kernel.org,
alexei.starovoitov@...il.com,
andrii@...nel.org,
daniel@...earbox.net,
tj@...nel.org,
martin.lau@...nel.org,
ameryhung@...il.com,
kernel-team@...a.com
Subject: [PATCH RFC v3 0/2] Task local data API
Hi,
This a respin of uptr KV store. It is renamed to task local data (TLD)
as the problem statement and the solution have changed, and it now draws
more similarities to pthread thread-specific data.
* Overview *
This patchset is a continuation of the original UPTR work[0], which aims
to provide a fast way for user space programs to pass per-task hints to
sched_ext schedulers. UPTR built the foundation by supporting sharing
user pages with bpf programs through task local storage maps.
Additionally, sched_ext would like to allow multiple developers to share
a storage without the need to explicitly agreeing on the layout of it.
This simplify code base management and makes experimenting easier.
While a centralized storage layout definition would have worked, the
friction of synchronizing it across different repos is not desirable.
This patchset contains the user space plumbing so that user space and bpf
program developers can exchange per-task hints easily through simple
interfaces.
* Design *
BPF task local data is a simple API for sharing task-specific data
between user space and bpf programs, where data are refered to using
string keys. As shown in the following figure, user space programs can
define a task local data using bpf_tld_type_var(). The data is
effectively a variable declared with __thread, which every thread owns an
independent copy and can be directly accessed. On the bpf side, a task
local data first needs to be initialized for every new task once (e.g.,
in sched_ext_ops::init_task) using bpf_tld_init_var(). Then, other bpf
programs can get a pointer to the data using bpf_tld_lookup(). The task
local data APIs refer to data using string keys so developers
does not need to deal with addresses of data in a shared storage.
┌─ Application ─────────────────────────────────────────┐
│ ┌─ library A ──────────────┐ │
│ bpf_tld_type_var(int, X) │ bpf_tld_type_var(int, Y) │ │
│ └┬─────────────────────────┘ │
└───────┬───────────────────│───────────────────────────┘
│ X = 123; │ Y = true;
V V
+ ─ Task local data ─ ─ ─ ─ ─ ─ +
| ┌─ task_kvs_map ────────────┐ | ┌─ sched_ext_ops::init_task ──────┐
| │ BPF Task local storage │ | │ bpf_tld_init_var(&kvs, X); │
| │ ┌───────────────────┐ │ |<─┤ bpf_tld_init_var(&kvs, Y); │
| │ │ __uptr *udata │ │ | └─────────────────────────────────┘
| │ └───────────────────┘ │ |
| │ ┌───────────────────┐ │ | ┌─ Other sched_ext_ops op ────────┐
| │ │ __uptr *umetadata │ │ | │ int *y; ├┐
| │ └───────────────────┘ │ |<─┤ y = bpf_tld_lookup(&kvs, Y, 1); ││
| └───────────────────────────┘ | │ if (y) ││
| ┌─ task_kvs_off_map ────────┐ | │ /* do something */ ││
| │ BPF Task local storage │ | └┬────────────────────────────────┘│
| └───────────────────────────┘ | └─────────────────────────────────┘
+ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ +
* Implementation *
Task local data API hides the memory management from the developers.
Internally, it shares user data with bpf programs through udata UPTRs.
Task local data from different compilation units are placed into a
custom "udata" section by the declaration API, bpf_tld_type_var(), so
that they are placed together in the memory. User space will need to
call bpf_tld_thread_init() for every new thread to pin udata pages to
kernel.
The metadata used to address udata is stored in umetadata UPTR. It is
generated by constructors inserted by bpf_tld_type_var() and
bpf_tld_thread_init(). umetadata is an array of 64 metadata corresponding
to each data, which contains the key and the offset of data in udata.
During initialization, bpf_tld_init_var() will search umetadata for
a matching key and cache its offset in task_kvs_off_map. Later,
bpf_tld_lookup() will use the cached offset to retreive a pointer to
udata.
* Limitation *
Currently, it is assumed all key-value pairs are known as a program
starts. All compilation units using task local data should be statically
linked together so that values are all placed together in a udata section
and therefore can be shared with bpf through two UPTRs. The next
iteration will explore how bpf task local data can work in dynamic
libraries. Maybe more udata UPTRs will be added to pin page of TLS
of dynamically loaded modules. Or maybe it will allocate memory for data
instead of relying on __thread, and change how user space interact with
task local data slightly. The later approach can also save some troubles
dealing with the restriction of UPTR.
Some other limitations:
- Total task local data cannot exceed a page
- Only support 64 task local data
- Some memory waste for data whose size is not power of two
due to UPTR limitation
[0] https://lore.kernel.org/bpf/20241023234759.860539-1-martin.lau@linux.dev/
Amery Hung (2):
selftests/bpf: Introduce task local data
selftests/bpf: Test basic workflow of task local data
.../bpf/prog_tests/task_local_data.c | 159 +++++++++++++++
.../bpf/prog_tests/task_local_data.h | 58 ++++++
.../bpf/prog_tests/test_task_local_data.c | 156 +++++++++++++++
.../selftests/bpf/progs/task_local_data.h | 181 ++++++++++++++++++
.../bpf/progs/test_task_local_data_basic.c | 78 ++++++++
.../selftests/bpf/task_local_data_common.h | 49 +++++
6 files changed, 681 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_task_local_data.c
create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.h
create mode 100644 tools/testing/selftests/bpf/progs/test_task_local_data_basic.c
create mode 100644 tools/testing/selftests/bpf/task_local_data_common.h
--
2.47.1
Powered by blists - more mailing lists