lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABFh=a5Z7i+Yk4i42=BSt9+9NLQtY-CP73M5kQiroQpNtsELfQ@mail.gmail.com>
Date: Mon, 28 Jul 2025 09:54:35 -0400
From: Emil Tsalapatis <linux-lists@...alapatis.com>
To: Amery Hung <ameryhung@...il.com>
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org, alexei.starovoitov@...il.com, 
	andrii@...nel.org, daniel@...earbox.net, tj@...nel.org, memxor@...il.com, 
	martin.lau@...nel.org, kernel-team@...a.com
Subject: Re: [PATCH bpf-next v6 0/3] Task local data

Some feedback on the cover letter:

On Thu, Jul 17, 2025 at 12:48 PM Amery Hung <ameryhung@...il.com> wrote:
>
> * Motivation *
>
> CPU schedulers can potentially make a better decision with hints from
> user space process. To support experimenting user space hinting with
> sched_ext, there needs a mechanism to pass a "per-task hint" from user
> space to the bpf scheduler "efficiently".
>

I think this is underselling the feature a bit, isn't it useful for
all BPF users?
Otherwise why isn't it in the scx repo?

> The proposed mechanism is task local data. Similar to pthread key or
> __thread, it allows users to define thread-specific data. In addition,
> the user pages that back task local data are pinned to the kernel to
> share with bpf programs directly. As a result, user space programs
> can directly update per-thread hints, and then bpf program can read
> the hint with little overhead. The diagram in the design section gives
> a sneak peek of how it works.
>
>
> * Overview *
>
> Task local data defines an abstract storage type for storing data specific
> to each task and provides user space and bpf libraries to access it. The
> result is a fast and easy way to share per-task data between user space
> and bpf programs. The intended use case is sched_ext, where user space
> programs will pass hints to sched_ext bpf programs to affect task
> scheduling.
>

Same as above, why is this a sched_ext feature? Maybe some of the design
choices are informed by it, but there is nothing sched_ext specific in here.

> Task local data is built on top of task local storage map and UPTR[0]
> to achieve fast per-task data sharing. UPTR is a type of special field
> supported in task local storage map value. A user page assigned to a UPTR
> will be pinned by the kernel when the map is updated. Therefore, user
> space programs can update data seen by bpf programs without syscalls.
>
> Additionally, unlike most bpf maps, task local data does not require a
> static map value definition. This design is driven by sched_ext, which
> would like to allow multiple developers to share a storage without the
> need to explicitly agree on the layout of it. While a centralized layout
> definition would have worked, the friction of synchronizing it across
> different repos is not desirable. This simplify code base management and
> makes experimenting easier.
>

This text is hard to follow because the point it makes belongs in the design
section, this is a design point that does not really matter at this
point in the text.
At least make it more abstract maybe?

> In the rest of the cover letter, "task local data" is used to refer to
> the abstract storage and TLD is used to denote a single data entry in
> the storage.
>
>
> * Design *
>
> Task local data library provides simple APIs for user space and bpf
> through two header files, task_local_data.h and task_loca_data.bpf.h,

Typo: task_loca_data ->task_local_data

> respectively. The usage is illustrated in the following diagram.
> An entry of data in the task local data, TLD, first needs to be defined
> with TLD_DEFINE_KEY() with the size of the data and a name associated with
> the data. The macro defines and initialize an opaque key object of

Typo: initialize -> initializes

> tld_key_t type, which can be used to locate a TLD. The same key may be
> passed to tld_get_data() in different threads, and a pointer to data
> specific to the calling thread will be returned. The pointer will
> remain valid until the process terminates, so there is not need to call

Typo: not need -> no need

> tld_get_data() in subsequent accesses.
>
> TLD_DEFINE_KEY() is allowed to define TLDs up to roughly a page. In the
> case when a TLD can only be known and created on the fly,
> tld_create_key() can be called. Since the total TLD size cannot be known
> beforehand, a memory of size TLD_DYN_DATA_SIZE is allocated for each
> thread to accommodate them.
>
> On the bpf side, programs will use also use tld_get_data() to locate
> TLDs. The arugments contain a name and a key to a TLD. The name is
> used for the first tld_get_data() to a TLD, which will lookup the TLD
> by name and save the corresponding key to a task local data map,
> tld_key_map. The map value type, struct tld_keys, __must__ be defined by
> developers. It should contain keys used in the compilation unit.
>
>
>  ┌─ Application ───────────────────────────────────────────────────────┐
>  │ TLD_DEFINE_KEY(kx, "X", 4);      ┌─ library A ─────────────────────┐│
>  │                                  │ void func(...)                  ││
>  │ int main(...)                    │ {                               ││
>  │ {                                │     tld_key_t ky;               ││
>  │      int *x;                     │     bool *y;                    ││
>  │                                  │                                 ││
>  │      x = tld_get_data(fd, kx);   │     ky = tld_create_key("Y", 1);││
>  │      if (x) *x = 123;            │     y = tld_get_data(fd, ky);   ││
>  │                         ┌────────┤     if (y) *y = true;           ││
>  │                         │        └─────────────────────────────────┘│
>  └───────┬─────────────────│───────────────────────────────────────────┘
>          V                 V
>  + ─ Task local data ─ ─ ─ ─ ─ +  ┌─ BPF program ──────────────────────┐
>  | ┌─ tld_data_map ──────────┐ |  │ struct tld_object obj;             │
>  | │ BPF Task local storage  │ |  │ bool *y;                           │
>  | │                         │ |  │ int *x;                            │
>  | │ __uptr *data            │ |  │                                    │
>  | │ __uptr *metadata        │ |  │ if (tld_init_object(task, &obj))   │
>  | └─────────────────────────┘ |  │     return 0;                      │
>  | ┌─ tld_key_map ───────────┐ |  │                                    │
>  | │ BPF Task local storage  │ |  │ x = tld_get_data(&obj, kx, "X", 4);│
>  | │                         │ |<─┤ if (x) /* do something */          │
>  | │ tld_key_t kx;           │ |  │                                    │
>  | │ tld_key_t ky;           │ |  │ y = tld_get_data(&obj, ky, "Y", 1);│
>  | └─────────────────────────┘ |  │ if (y) /* do something */          │
>  + ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ +  └────────────────────────────────────┘
>
>
>
> * Implementation *
>
> Task local data defines the storage to be a task local storage map with
> two UPTRs, data and metadata. Data points to a blob of memory for storing
> TLDs individual to every task with the offset of data in a page. Metadata,

Nit: Maybe just use the name of the struct, u_tld_metadata?

> individual to each process and shared by its threads, records the total
> number and size of TLDs and the metadata of each TLD. Metadata for a
> TLD contains the key name and the size of the TLD.
>
>   struct u_tld_data {
>           u64 start;
>           char data[PAGE_SIZE - 8];
>   };
>
>   struct u_tld_metadata {
>           u8 cnt;
>           u16 size;
>           struct metadata data[TLD_DATA_CNT];

Rename data -> metadata? That's how it is in the code.

>   };
>
> Both user space and bpf API follow the same protocol when accessing
> task local data. A pointer to a TLD is located by a key. tld_key_t
> effectively is the offset of a TLD in data. To add a TLD, user space

typo: user space API -> the user space API

> API, loops through metadata->data until an empty slot is found and update
> it. It also adds sizes of prior TLDs along the way to derive the offset.
> To locate a TLD in bpf when the first time tld_get_data() is called,
> __tld_fetch_key() also loops through metadata->data until the name is

Same here, isn't it ->metadata?

> found. The offset is also derived by adding sizes. When the TLD is not
> found, the current TLD count is cached instead to skip name comparison
> that has been done. The detail of task local data operations can be found
> in patch 1.
>
>
> * Misc *
>
> The metadata can potentially use run-length encoding for names to reduce
> memory wastage and support save more TLDs. I have a version that works,
> but the selftest takes a bit longer to finish. More investigation needed
> to find the root cause. I will save this for the future when there is a
> need to store more than 63 TLDs.
>
>
> [0] https://lore.kernel.org/bpf/20241023234759.860539-1-martin.lau@linux.dev/
>

Reviewed-by: Emil Tsalapatis <emil@...alapatis.com>



> ---
>
> v5 -> v6
>   - Address Andrii's comment
>   - Fix verification failure in no_alu32
>   - Some cleanup
>   v5: https://lore.kernel.org/bpf/20250627233958.2602271-1-ameryhung@gmail.com/
>
> v4 -> v5
>   - Add an option to free memory on thread exit to prevent memory leak
>   - Add an option to reduce memory waste if the allocator can
>     use just enough memory to fullfill aligned_alloc() (e.g., glibc)
>   - Tweak bpf API
>       - Remove tld_fetch_key() as it does not work in init_tasl
>       - tld_get_data() now tries to fetch key if it is not cached yet
>   - Optimize bpf side tld_get_data()
>       - Faster fast path
>       - Less code
>   - Use stdatomic.h in user space library with seq_cst order
>   - Introduce TLD_DEFINE_KEY() as the default TLD creation API for
>     easier memory management.
>       - TLD_DEFINE_KEY() can consume memory up to a page and no memory
>         is wasted since their size is known before per-thread data
>         allocation.
>       - tld_create_key() can only use up to TLD_DYN_DATA_SIZE. Since
>         tld_create_key can run any time even after per-thread data
>         allocation, it is impossible to predict the total size. A
>         configurable size of memory is allocated on top of the total
>         size of TLD_DEFINE_KEY() to accommodate dynamic key creation.
>   - Add tld prefix to all macros
>   - Replace map_update(NO_EXIST) in __tld_init_data() with cmpxchg()
>   - No more +1,-1 dance on the bpf side
>   - Reduce printf from ASSERT in race test
>   - Try implementing run-length encoding for name and decide to
>     save it for the future
>   v4: https://lore.kernel.org/bpf/20250515211606.2697271-1-ameryhung@gmail.com/
>
> v3 -> v4
>   - API improvements
>       - Simplify API
>       - Drop string obfuscation
>       - Use opaque type for key
>       - Better documentation
>   - Implementation
>       - Switch to dynamic allocation for per-task data
>       - Now offer as header-only libraries
>       - No TLS map pinning; leave it to users
>   - Drop pthread dependency
>   - Add more invalid tld_create_key() test
>   - Add a race test for tld_create_key()
>   v3: https://lore.kernel.org/bpf/20250425214039.2919818-1-ameryhung@gmail.com/
>
>
> Amery Hung (3):
>   selftests/bpf: Introduce task local data
>   selftests/bpf: Test basic task local data operations
>   selftests/bpf: Test concurrent task local data key creation
>
>  .../bpf/prog_tests/task_local_data.h          | 388 ++++++++++++++++++
>  .../bpf/prog_tests/test_task_local_data.c     | 297 ++++++++++++++
>  .../selftests/bpf/progs/task_local_data.bpf.h | 227 ++++++++++
>  .../bpf/progs/test_task_local_data.c          |  65 +++
>  4 files changed, 977 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/test_task_local_data.c
>  create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.bpf.h
>  create mode 100644 tools/testing/selftests/bpf/progs/test_task_local_data.c
>
> --
> 2.47.1
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ