lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzaFzSmLDOu=GQk=0p9NoDn2boVBSBRm4Ejh3BdVB6bmSQ@mail.gmail.com>
Date: Thu, 1 May 2025 13:38:55 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Amery Hung <ameryhung@...il.com>
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org, alexei.starovoitov@...il.com, 
	andrii@...nel.org, daniel@...earbox.net, tj@...nel.org, martin.lau@...nel.org, 
	kernel-team@...a.com
Subject: Re: [PATCH RFC v3 1/2] selftests/bpf: Introduce task local data

On Fri, Apr 25, 2025 at 2:40 PM Amery Hung <ameryhung@...il.com> wrote:
>
> Task local data provides simple and fast bpf and user space APIs to
> exchange per-task data through task local storage map. The user space
> first declares task local data using bpf_tld_type_key_var() or
> bpf_tld_type_var(). The data is a thread-specific variable which
> every thread has its own copy. Then, a bpf_tld_thread_init() needs to
> be called for every thread to share the data with bpf. Finally, users
> can directly read/write thread local data without bpf syscall.
>
> For bpf programs to see task local data, every data need to be
> initialized once for every new task using bpf_tld_init_var(). Then
> bpf programs can retrieve pointers to the data with bpf_tld_lookup().
>
> The core of task local storage implementation relies on UPTRs. They
> pin user pages to the kernel so that user space can share data with bpf
> programs without syscalls. Both data and the metadata used to access
> data are pinned via UPTRs.
>
> A limitation of UPTR makes the implementation of task local data
> less trivial than it sounds: memory pinned to UPTR cannot exceed a
> page and must not cross the page boundary. In addition, the data
> declaration uses __thread identifier and therefore does not have
> directly control over the memory allocation. Therefore, several
> tricks and checks are used to make it work.
>
> First, task local data declaration APIs place data in a custom "udata"
> section so that data from different compilation units will be contiguous
> in the memory and can be pinned using two UPTRs if they are smaller than
> one page.
>
> To avoid each data from spanning across two pages, they are each aligned
> to the smallest power of two larget than their sizes.
>
> As we don't control the memory allocation for data, we need to figure
> out the layout of user defined data. This is done by the data
> declaration API and bpf_tld_thread_init(). The data declaration API
> will insert constructors for all data, and they are used to find the
> size and offset of data as well as the beginning and end of the whole
> udata section. Then, bpf_tld_thread_init() performs a per-thread check
> to make sure no data will cross the page boundary as udata can start at
> different offset in a page.
>
> Note that for umetadata, we directly aligned_alloc() memory for it and
> assigned to the UPTR. This is only done once for every process as every
> tasks shares the same umetadata. The actual thread-specific data offset
> will be adjusted in the bpf program when calling bpf_tld_init_var().
>
> Signed-off-by: Amery Hung <ameryhung@...il.com>
> ---
>  .../bpf/prog_tests/task_local_data.c          | 159 +++++++++++++++
>  .../bpf/prog_tests/task_local_data.h          |  58 ++++++
>  .../selftests/bpf/progs/task_local_data.h     | 181 ++++++++++++++++++
>  .../selftests/bpf/task_local_data_common.h    |  41 ++++
>  4 files changed, 439 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.c
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
>  create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.h
>  create mode 100644 tools/testing/selftests/bpf/task_local_data_common.h
>

[...]

> +/**
> + * @brief bpf_tld_key_type_var() declares a task local data shared with bpf
> + * programs. The data will be a thread-specific variable which a user space
> + * program can directly read/write, while bpf programs will need to lookup
> + * with the string key.
> + *
> + * @param key The string key a task local data will be associated with. The
> + * string will be truncated if the length exceeds TASK_LOCAL_DATA_KEY_LEN
> + * @param type The type of the task local data
> + * @param var The name of the task local data
> + */
> +#define bpf_tld_key_type_var(key, type, var)                                   \
> +__thread type var SEC("udata") __aligned(ROUND_UP_POWER_OF_TWO(sizeof(type))); \
> +                                                                               \
> +__attribute__((constructor))                                                   \
> +void __bpf_tld_##var##_init(void)                                              \
> +{                                                                              \
> +       _Static_assert(sizeof(type) < PAGE_SIZE,                                \
> +                      "data size must not exceed a page");                     \
> +       __bpf_tld_var_init(key, &var, sizeof(type));                            \
> +}
> +
> +/**
> + * @brief bpf_tld_key_type_var() declares a task local data shared with bpf

declares -> defines, declaration would involve extern and no storage
allocated (it's only to reference something *defined* in another
compilation unit)

> + * programs. The data will be a thread-specific variable which a user space
> + * program can directly read/write, while bpf programs will need to lookup
> + * the data with the string key same as the variable name.
> + *
> + * @param type The type of the task local data
> + * @param var The name of the task local data as well as the name of the
> + * key. The key string will be truncated if the length exceeds
> + * TASK_LOCAL_DATA_KEY_LEN.
> + */
> +#define bpf_tld_type_var(type, var) \
> +       bpf_tld_key_type_var(#var, type, var)
> +
> +/**
> + * @brief bpf_tld_thread_init() initializes the task local data for the current
> + * thread. All data are undefined from a bpf program's point of view until
> + * bpf_tld_thread_init() is called.
> + *
> + * @return 0 on success; negative error number on failure
> + */
> +int bpf_tld_thread_init(void);
> +
> +#endif
> diff --git a/tools/testing/selftests/bpf/progs/task_local_data.h b/tools/testing/selftests/bpf/progs/task_local_data.h
> new file mode 100644
> index 000000000000..7358993ee634
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/task_local_data.h

given this is BPF-only header, I'd make it more explicit by using
.bpf.h suffix, just like libbpf's usdt.bpf.h

[...]

> +/**
> + * @brief bpf_tld_lookup() lookups the task KV store using the cached offset
> + * corresponding to the key.
> + *
> + * @param tld A pointer to a valid bpf_task_local_data object initialized by bpf_tld_init()
> + * @param key The key used to lookup the task KV store. Should be one of the
> + * symbols defined in struct task_local_data_offsets, not a string
> + * @param size The size of the value. Must be a known constant value
> + * @return A pointer to the value corresponding to the key; NULL if the offset
> + * if not cached or the size is too big
> + */
> +#define bpf_tld_lookup(tld, key, size) __bpf_tld_lookup(tld, (tld)->off_map->key, size)
> +__attribute__((unused))
> +static void *__bpf_tld_lookup(struct bpf_task_local_data *tld, short cached_off, int size)

I'd probably use some opaque type for these offsets to discourage user
application to try to create them, instead of strictly using offsets
returned from API. So:

typedef struct { int off; } tld_off_t;

and then just use tld_off_t everywhere?


> +{
> +       short page_off, page_idx;
> +
> +       if (!cached_off--)
> +               return NULL;
> +
> +       page_off = cached_off & ~PAGE_IDX_MASK;
> +       page_idx = !!(cached_off & PAGE_IDX_MASK);
> +
> +       if (page_idx) {
> +               return (tld->data_map->udata[1].page && page_off < PAGE_SIZE - size) ?
> +                       (void *)tld->data_map->udata[1].page + page_off : NULL;
> +       } else {
> +               return (tld->data_map->udata[0].page && page_off < PAGE_SIZE - size) ?
> +                       (void *)tld->data_map->udata[0].page + page_off : NULL;
> +       }
> +}

[...]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ