lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCuOsXKCkwa8zkwR@slm.duckdns.org>
Date: Mon, 19 May 2025 10:04:01 -1000
From: Tejun Heo <tj@...nel.org>
To: Amery Hung <ameryhung@...il.com>
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org,
	alexei.starovoitov@...il.com, andrii@...nel.org,
	daniel@...earbox.net, memxor@...il.com, martin.lau@...nel.org,
	kernel-team@...a.com
Subject: Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data

Hello,

On Thu, May 15, 2025 at 02:16:00PM -0700, Amery Hung wrote:
...
> +#define PAGE_SIZE 4096

This might conflict with other definitions. Looks like non-4k page sizes are
a lot more popular on arm. Would this be a problem?

> +static int __tld_init_metadata(int map_fd)
> +{
> +	struct u_tld_metadata *new_metadata;
> +	struct tld_map_value map_val;
> +	int task_fd = 0, err;
> +
> +	task_fd = syscall(SYS_pidfd_open, getpid(), 0);
> +	if (task_fd < 0) {
> +		err = -errno;
> +		goto out;
> +	}
> +
> +	new_metadata = aligned_alloc(PAGE_SIZE, PAGE_SIZE);

Is 4k size limit from UPTR? Is it still 4k on machines with >4k pages? If
this isn't a hard limit from UPTR, would it make sense to encode the size in
the header part of the metadata?

> +static int __tld_init_data(int map_fd)
> +{
> +	struct u_tld_data *new_data = NULL;
> +	struct tld_map_value map_val;
> +	int err, task_fd = 0;
> +
> +	task_fd = syscall(SYS_pidfd_open, gettid(), PIDFD_THREAD);
> +	if (task_fd < 0) {
> +		err = -errno;
> +		goto out;
> +	}
> +
> +	new_data = aligned_alloc(PAGE_SIZE, TLD_DATA_SIZE);

Ditto.

Noob question. Does this means that each thread will map a 4k page no matter
how much data it actually uses?

> +__attribute__((unused))
> +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
> +{
> +	int err, i, cnt, sz, off = 0;
> +
> +	if (!READ_ONCE(tld_metadata_p)) {
> +		err = __tld_init_metadata(map_fd);
> +		if (err)
> +			return (tld_key_t) {.off = err};
> +	}
> +
> +	if (!tld_data_p) {
> +		err = __tld_init_data(map_fd);
> +		if (err)
> +			return (tld_key_t) {.off = err};
> +	}
> +
> +	size = round_up(size, 8);
> +
> +	for (i = 0; i < TLD_DATA_CNT; i++) {
> +retry:
> +		cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
> +		if (i < cnt) {
> +			/*
> +			 * Pending tld_create_key() uses size to signal if the metadata has
> +			 * been fully updated.
> +			 */
> +			while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> +						      __ATOMIC_ACQUIRE)))
> +				sched_yield();
> +
> +			if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> +				return (tld_key_t) {.off = -EEXIST};
> +
> +			off += sz;
> +			continue;
> +		}
> +
> +		if (off + size > TLD_DATA_SIZE)
> +			return (tld_key_t) {.off = -E2BIG};
> +
> +		/*
> +		 * Only one tld_create_key() can increase the current cnt by one and
> +		 * takes the latest available slot. Other threads will check again if a new
> +		 * TLD can still be added, and then compete for the new slot after the
> +		 * succeeding thread update the size.
> +		 */
> +		if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> +						 __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> +			goto retry;
> +
> +		strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> +		__atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> +		return (tld_key_t) {.off = off};
> +	}
> +
> +	return (tld_key_t) {.off = -ENOSPC};
> +}

This looks fine to me but I wonder whether run-length encoding the key
strings would be more efficient and less restrictive in terms of key length.
e.g.:

struct key {
        u32 data_len;
        u16 key_off;
        u16 key_len;
};

struct metadata {
        struct key      keys[MAX_KEYS];
        char            key_strs[SOME_SIZE];
};

The logic can be mostly the same. The only difference would be that key
string is not inline. Determine winner in the creation path by compxchg'ing
on data_len, but set key_off and key_len only after key string is updated.
Losing on cmpxhcg or seeing an entry where key_len is zero means that that
one lost and should relax and retry. It can still use the same 4k metadata
page but will likely be able to allow more keys while also relaxing
restrictions on key length.

Hmm... maybe making the key string variably sized makes things difficult for
the BPF code. If so (or for any other reasons), please feel free to ignore
the above.

> +#endif /* __TASK_LOCAL_DATA_H */
> diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> new file mode 100644
> index 000000000000..5f48e408a5e5
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
...
> +/**
> + * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
> + *
> + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> + * @key: The key of a TLD saved in tld_maps
> + * @size: The size of the TLD. Must be a known constant value
> + *
> + * Returns a pointer to the TLD data associated with the key; NULL if the key
> + * is not valid or the size is too big
> + */
> +#define tld_get_data(tld_obj, key, size) \
> +	__tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
> +
> +__attribute__((unused))
> +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
> +{
> +	return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
> +		(void *)tld_obj->data_map->data + off : NULL;
> +}

Neat.

Generally looks great to me. The only thing I wonder is whether the data
area sizing can be determined at init time rather than fixed to 4k.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ