[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMB2axPBsi=D3c+ddH0wcmOCC1SV=oMyZPM=+WXCqCnuDforsQ@mail.gmail.com>
Date: Fri, 2 May 2025 14:23:35 -0700
From: Amery Hung <ameryhung@...il.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Tejun Heo <tj@...nel.org>, bpf@...r.kernel.org, netdev@...r.kernel.org,
alexei.starovoitov@...il.com, andrii@...nel.org, daniel@...earbox.net,
martin.lau@...nel.org, kernel-team@...a.com
Subject: Re: [PATCH RFC v3 0/2] Task local data API
On Fri, May 2, 2025 at 1:11 PM Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
>
> On Fri, May 2, 2025 at 11:36 AM Tejun Heo <tj@...nel.org> wrote:
> >
> > Hello,
> >
> > On Fri, May 02, 2025 at 09:14:47AM -0700, Andrii Nakryiko wrote:
> > > > The advantage of no memory wasted for threads that are not using TLD
> > > > doesn't seem to be that definite to me. If users add per-process
> > > > hints, then this scheme can potentially use a lot more memory (i.e.,
> > > > PAGE_SIZE * number of threads). Maybe we need another uptr for
> > > > per-process data? Or do you think this is out of the scope of TLD and
> > > > we should recommend other solutions?
> > >
> > > I'd keep it simple. One page per thread isn't a big deal at all, in my
> > > mind. If the application has a few threads, then a bunch of kilobytes
> > > is not a big deal. If the application has thousands of threads, then a
> > > few megabytes for this is the least of that application's concern,
> > > it's already heavy-weight as hell. I think we are overpivoting on
> > > saving a few bytes here.
> >
> > It could well be that 4k is a price worth paying but there will be cases
> > where this matters. With 100k threads - not common but not unheard of
> > either, that's ~400MB. If the data needed to be shared is small and most of
> > that is wasted, that's not an insignificant amount. uptr supports sub-page
> > sizing, right? If keeping sizing dynamic is too complex, can't a process
> > just set the max size to what it deems appropriate?
> >
>
> One page was just a maximum supportable size due to uptr stuff. But it
> can absolutely be (much) smaller than that, of course. The main
> simplification from having a single fixed-sized data area allocation
> is that an application can permanently cache an absolute pointer
> returned from tld_resolve_key(). If we allow resizing the data area,
> all previously returned pointers could be invalidated. So that's the
> only thing. But yeah, if we know that we won't need more than, say 64
> bytes, nothing prevents us from allocating just those 64 bytes (per
> participating thread) instead of an entire page.
>
Since users can add keys on the fly, I feel it is natural to also
allocate data area dynamically. Otherwise, there is going to be this
hard trade-off between data size limit and waste of memory.
We can tweak the implementation to make it allocate data dynamically.
The two user space APIs can remain almost the same, but users should
not cache the pointer returned from tld_resolve_ptr(). The only
difference is changing tld_off_t to the metadata index.
void *tld_resolve_ptr(tld_off_t idx) will allocate data area lazily.
- Record total tld data size in tld_metadata, data_sz.
- Use a __thread variable, th_data_sz, to keep track of allocated
memory for the thread (can be a small number or 0 initially)
- If offs[idx] + szs[idx] > th_data_sz, resize the memory based on sum
(can be exactly the same or roundup to the next power of 2 to prevent
frequent reallocation)
- If offs[idx] + szs[idx] <= th_data_sz, return tld->data + offs[idx]
(fast path)
The downside is data access overhead as pointers cannot be cached, but
I think it is an okay middle ground.
> > Thanks.
> >
> > --
> > tejun
Powered by blists - more mailing lists