netdev - Re: [PATCH RFC v3 0/2] Task local data API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAADnVQ+N5u8KTtcsKOWcDmQ4X=OmvTf+LfLhY=VLQJ7q-=Li7Q@mail.gmail.com>
Date: Fri, 2 May 2025 10:55:40 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Amery Hung <ameryhung@...il.com>, bpf <bpf@...r.kernel.org>, 
	Network Development <netdev@...r.kernel.org>, Andrii Nakryiko <andrii@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, Tejun Heo <tj@...nel.org>, 
	Martin KaFai Lau <martin.lau@...nel.org>, Kernel Team <kernel-team@...a.com>
Subject: Re: [PATCH RFC v3 0/2] Task local data API

On Thu, May 1, 2025 at 7:22 PM Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
>
> I wasn't trying to optimize those few bytes taken by szs, tbh.
> Allocating from the end of the page bakes in the assumption that we
> won't ever need more than one page. I don't know if I'd do that. But
> we can just track "next available offset" instead, so it doesn't
> really matter much.

Right. That works too.


> >
> > I'm not quite sure how different processes can do it locklessly.
>
> There are no different processes, it's all one process, many
> threads... Or is that what you meant? tld_metadata is *per process*,
> tld_data is *per thread*. Processes don't need to coordinate anything
> between themselves, only threads within the process.

Yeah. I confused myself thinking that we need to support this
through fork/exec. Since they will be different processes
they will have their own task local storage map elements,
so any kind of signaling into a child needs to be done in user space.
Using bpf tls map won't work.

So using one "tld" library in multiple threads within a single
process works. No need to complicate things by asking kernel tls map.
"tld" library can keep whatever state it needs,
centralized locking, etc.

> As for how I'd do offset allocation and key addition locklessly. You
> are right that it can't be done completely locklessly, but just
> looping and yielding probably would be fine.
> =
>
> Then the sequence of adding the key would be something like below.
> I've modified tld_metadata a bit to make this simpler and more
> economical (and I fixed definition of keys array of array of chars,
> oops):
>
> struct tld_metadata {
>     int cnt;
>     int next_off;
>     char keys[MAX_KEY_CNT][MAX_KEY_LEN];
>     __u16 offs[MAX_KEY_CNT];
> };
>
> struct tld_metadata *m = ...;
> const char *new_key = ...;
> int i = 0;
>
> /* all m->offs[i] are set to -1 on creation */
> again:
>
>     int key_cnt = m->cnt;
>     for (; i < key_cnt; i++) {
>        while (m->offs[i] < 0) /* update in progress */
>             sched_yield();
>
>        if (strcmp(m->keys[i], new_key) == 0)
>             return m->offs[i];
>
>        if (!cmpxchg(*m->cnt, key_cnt, key_cnt + 1)) {
>             goto again; /* we raced, key might have been added
> already, recheck, but keep i */
>
>        /* slot key_cnt is ours, we need to calculate and assign offset */
>        int new_off = m->next_off;
>        m->next_off = new_off + key_sz;
>
>        m->keys[key_cnt][0] = '\0';
>        strncat(m->keys[key_cnt], new_key, MAX_KEY_LEN);
>
>        /* MEMORY BARRIERS SHOULD BE CAREFULLY CONSIDERED */
>
>        m->offs[key_cnt] = new_off; /* this is finalizing key -> offset
> assignment */
>
>        /* MEMORY BARRIERS SHOULD BE CAREFULLY CONSIDERED */
>
>        return new_off; /* we are done */
>     }
>
> Something like that. There is that looping and yield to not miss
> someone else winning the race and adding a key, so that's the locking
> part. But given that adding a key definition is supposed to be one
> time operation (per key), I don't think we should be fancy with
> locking.

something like that should work.
I wish there was some trivial futex wrapper in .h
that can be used instead of pthread_mutex baggage.