[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQ+N5u8KTtcsKOWcDmQ4X=OmvTf+LfLhY=VLQJ7q-=Li7Q@mail.gmail.com>
Date: Fri, 2 May 2025 10:55:40 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Amery Hung <ameryhung@...il.com>, bpf <bpf@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>, Andrii Nakryiko <andrii@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, Tejun Heo <tj@...nel.org>,
Martin KaFai Lau <martin.lau@...nel.org>, Kernel Team <kernel-team@...a.com>
Subject: Re: [PATCH RFC v3 0/2] Task local data API
On Thu, May 1, 2025 at 7:22 PM Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
>
> I wasn't trying to optimize those few bytes taken by szs, tbh.
> Allocating from the end of the page bakes in the assumption that we
> won't ever need more than one page. I don't know if I'd do that. But
> we can just track "next available offset" instead, so it doesn't
> really matter much.
Right. That works too.
> >
> > I'm not quite sure how different processes can do it locklessly.
>
> There are no different processes, it's all one process, many
> threads... Or is that what you meant? tld_metadata is *per process*,
> tld_data is *per thread*. Processes don't need to coordinate anything
> between themselves, only threads within the process.
Yeah. I confused myself thinking that we need to support this
through fork/exec. Since they will be different processes
they will have their own task local storage map elements,
so any kind of signaling into a child needs to be done in user space.
Using bpf tls map won't work.
So using one "tld" library in multiple threads within a single
process works. No need to complicate things by asking kernel tls map.
"tld" library can keep whatever state it needs,
centralized locking, etc.
> As for how I'd do offset allocation and key addition locklessly. You
> are right that it can't be done completely locklessly, but just
> looping and yielding probably would be fine.
> =
>
> Then the sequence of adding the key would be something like below.
> I've modified tld_metadata a bit to make this simpler and more
> economical (and I fixed definition of keys array of array of chars,
> oops):
>
> struct tld_metadata {
> int cnt;
> int next_off;
> char keys[MAX_KEY_CNT][MAX_KEY_LEN];
> __u16 offs[MAX_KEY_CNT];
> };
>
> struct tld_metadata *m = ...;
> const char *new_key = ...;
> int i = 0;
>
> /* all m->offs[i] are set to -1 on creation */
> again:
>
> int key_cnt = m->cnt;
> for (; i < key_cnt; i++) {
> while (m->offs[i] < 0) /* update in progress */
> sched_yield();
>
> if (strcmp(m->keys[i], new_key) == 0)
> return m->offs[i];
>
> if (!cmpxchg(*m->cnt, key_cnt, key_cnt + 1)) {
> goto again; /* we raced, key might have been added
> already, recheck, but keep i */
>
> /* slot key_cnt is ours, we need to calculate and assign offset */
> int new_off = m->next_off;
> m->next_off = new_off + key_sz;
>
> m->keys[key_cnt][0] = '\0';
> strncat(m->keys[key_cnt], new_key, MAX_KEY_LEN);
>
> /* MEMORY BARRIERS SHOULD BE CAREFULLY CONSIDERED */
>
> m->offs[key_cnt] = new_off; /* this is finalizing key -> offset
> assignment */
>
> /* MEMORY BARRIERS SHOULD BE CAREFULLY CONSIDERED */
>
> return new_off; /* we are done */
> }
>
> Something like that. There is that looping and yield to not miss
> someone else winning the race and adding a key, so that's the locking
> part. But given that adding a key definition is supposed to be one
> time operation (per key), I don't think we should be fancy with
> locking.
something like that should work.
I wish there was some trivial futex wrapper in .h
that can be used instead of pthread_mutex baggage.
Powered by blists - more mailing lists