lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250425214039.2919818-1-ameryhung@gmail.com>
Date: Fri, 25 Apr 2025 14:40:32 -0700
From: Amery Hung <ameryhung@...il.com>
To: bpf@...r.kernel.org
Cc: netdev@...r.kernel.org,
	alexei.starovoitov@...il.com,
	andrii@...nel.org,
	daniel@...earbox.net,
	tj@...nel.org,
	martin.lau@...nel.org,
	ameryhung@...il.com,
	kernel-team@...a.com
Subject: [PATCH RFC v3 0/2] Task local data API

Hi,

This a respin of uptr KV store. It is renamed to task local data (TLD)
as the problem statement and the solution have changed, and it now draws
more similarities to pthread thread-specific data.

* Overview *

This patchset is a continuation of the original UPTR work[0], which aims
to provide a fast way for user space programs to pass per-task hints to
sched_ext schedulers. UPTR built the foundation by supporting sharing
user pages with bpf programs through task local storage maps.

Additionally, sched_ext would like to allow multiple developers to share
a storage without the need to explicitly agreeing on the layout of it.
This simplify code base management and makes experimenting easier.
While a centralized storage layout definition would have worked, the
friction of synchronizing it across different repos is not desirable.

This patchset contains the user space plumbing so that user space and bpf
program developers can exchange per-task hints easily through simple
interfaces.

* Design *

BPF task local data is a simple API for sharing task-specific data
between user space and bpf programs, where data are refered to using 
string keys. As shown in the following figure, user space programs can
define a task local data using bpf_tld_type_var(). The data is
effectively a variable declared with __thread, which every thread owns an
independent copy and can be directly accessed. On the bpf side, a task
local data first needs to be initialized for every new task once (e.g.,
in sched_ext_ops::init_task) using bpf_tld_init_var(). Then, other bpf
programs can get a pointer to the data using bpf_tld_lookup(). The task
local data APIs refer to data using string keys so developers
does not need to deal with addresses of data in a shared storage.

 ┌─ Application ─────────────────────────────────────────┐
 │                          ┌─ library A ──────────────┐ │
 │ bpf_tld_type_var(int, X) │ bpf_tld_type_var(int, Y) │ │
 │                          └┬─────────────────────────┘ │
 └───────┬───────────────────│───────────────────────────┘
         │ X = 123;          │ Y = true;
         V                   V
 + ─ Task local data ─ ─ ─ ─ ─ ─ +
 | ┌─ task_kvs_map ────────────┐ |  ┌─ sched_ext_ops::init_task ──────┐
 | │ BPF Task local storage    │ |  │ bpf_tld_init_var(&kvs, X);      │
 | │  ┌───────────────────┐    │ |<─┤ bpf_tld_init_var(&kvs, Y);      │
 | │  │ __uptr *udata     │    │ |  └─────────────────────────────────┘ 
 | │  └───────────────────┘    │ |
 | │  ┌───────────────────┐    │ |  ┌─ Other sched_ext_ops op ────────┐
 | │  │ __uptr *umetadata │    │ |  │ int *y;                         ├┐
 | │  └───────────────────┘    │ |<─┤ y = bpf_tld_lookup(&kvs, Y, 1); ││
 | └───────────────────────────┘ |  │ if (y)                          ││
 | ┌─ task_kvs_off_map ────────┐ |  │     /* do something */          ││
 | │ BPF Task local storage    │ |  └┬────────────────────────────────┘│
 | └───────────────────────────┘ |   └─────────────────────────────────┘
 + ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ +

* Implementation *

Task local data API hides the memory management from the developers.
Internally, it shares user data with bpf programs through udata UPTRs.
Task local data from different compilation units are placed into a
custom "udata" section by the declaration API, bpf_tld_type_var(), so
that they are placed together in the memory. User space will need to
call bpf_tld_thread_init() for every new thread to pin udata pages to
kernel.

The metadata used to address udata is stored in umetadata UPTR. It is
generated by constructors inserted by bpf_tld_type_var() and
bpf_tld_thread_init(). umetadata is an array of 64 metadata corresponding
to each data, which contains the key and the offset of data in udata.
During initialization, bpf_tld_init_var() will search umetadata for
a matching key and cache its offset in task_kvs_off_map. Later,
bpf_tld_lookup() will use the cached offset to retreive a pointer to
udata.

* Limitation *    

Currently, it is assumed all key-value pairs are known as a program
starts. All compilation units using task local data should be statically
linked together so that values are all placed together in a udata section
and therefore can be shared with bpf through two UPTRs. The next
iteration will explore how bpf task local data can work in dynamic
libraries. Maybe more udata UPTRs will be added to pin page of TLS
of dynamically loaded modules. Or maybe it will allocate memory for data
instead of relying on __thread, and change how user space interact with
task local data slightly. The later approach can also save some troubles
dealing with the restriction of UPTR.

Some other limitations:
 - Total task local data cannot exceed a page
 - Only support 64 task local data
 - Some memory waste for data whose size is not power of two
   due to UPTR limitation

[0] https://lore.kernel.org/bpf/20241023234759.860539-1-martin.lau@linux.dev/


Amery Hung (2):
  selftests/bpf: Introduce task local data
  selftests/bpf: Test basic workflow of task local data

 .../bpf/prog_tests/task_local_data.c          | 159 +++++++++++++++
 .../bpf/prog_tests/task_local_data.h          |  58 ++++++
 .../bpf/prog_tests/test_task_local_data.c     | 156 +++++++++++++++
 .../selftests/bpf/progs/task_local_data.h     | 181 ++++++++++++++++++
 .../bpf/progs/test_task_local_data_basic.c    |  78 ++++++++
 .../selftests/bpf/task_local_data_common.h    |  49 +++++
 6 files changed, 681 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_task_local_data.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.h
 create mode 100644 tools/testing/selftests/bpf/progs/test_task_local_data_basic.c
 create mode 100644 tools/testing/selftests/bpf/task_local_data_common.h

-- 
2.47.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ