[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHFjqRPao2DOF35rHrYDOAjVC+dcWJ2kGm+7JqnMNk=o2A@mail.gmail.com>
Date: Sun, 23 Nov 2025 17:39:16 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: oleg@...hat.com, brauner@...nel.org, linux-kernel@...r.kernel.org,
akpm@...ux-foundation.org, linux-mm@...ck.org
Subject: Re: [PATCH 0/3] further damage-control lack of clone scalability
On Sun, Nov 23, 2025 at 4:00 PM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Sun, Nov 23, 2025 at 07:30:51AM +0100, Mateusz Guzik wrote:
> > When spawning and killing threads in separate processes in parallel the
> > primary bottleneck on the stock kernel is pidmap_lock, largely because
> > of a back-to-back acquire in the common case.
> >
> > Benchmark code at the end.
> >
> > With this patchset alloc_pid() only takes the lock once and consequently
> > alleviates the problem. While scalability improves, the lock remains the
> > primary bottleneck by a large margin.
> >
> > I believe idr is a poor choice for the task at hand to begin with, but
> > sorting out that out beyond the scope of this patchset. At the same time
> > any replacement would be best evaluated against a state where the
> > above relock problem is fixed.
>
> Good news! The IDR is deprecated. Bad news! I'm not 100% sure that
> the XArray is quite appropriate for this usecase. I am opposed to
> introducing more IDR APIs. Have you looked at converting to the XArray?
> Or do you have a better data structure in mind than the XArray?
>
I have some recollection we talked about this on irc long time ago.
It is my *suspicion* this would be best served with a sparse bitmap +
a hash table.
Such a solution was already present, but it got replaced by
95846ecf9dac5089 ("pid: replace pid bitmap implementation with IDR
API").
Commit message cites the following bench results:
The following are the stats for ps, pstree and calling readdir on /proc
for 10,000 processes.
ps:
With IDR API With bitmap
real 0m1.479s 0m2.319s
user 0m0.070s 0m0.060s
sys 0m0.289s 0m0.516s
pstree:
With IDR API With bitmap
real 0m1.024s 0m1.794s
user 0m0.348s 0m0.612s
sys 0m0.184s 0m0.264s
proc:
With IDR API With bitmap
real 0m0.059s 0m0.074s
user 0m0.000s 0m0.004s
sys 0m0.016s 0m0.016s
Impact on clone was not benchmarked afaics.
It may be a hash table is a bad pick here, but the commit does not
explain what (if anything) was done to try to make it work. As is I
suspect both sizing of the table itself and the hashing algo were
suboptimal for the above test.
I did find a patchset converting the thing to xarray here:
https://lore.kernel.org/all/20221202171620.509140-1-bfoster@redhat.com/
. It looks like it would need a mildly annoying rebase.
The patchset does not come with any numbers for forking either and it
predates other unscrewing I did in the area (which notably moved
tasklist lock away from being the primary bottleneck). If I'm reading
it correctly it comes with lock trips per ns level. This adds overhead
both from single-threaded and multi-threaded standpoint, in particular
still having multiple acquires of the same lock. If it is fine to use
xarray and retain pidmap_lock that's a different story, I have not
looked into it.
Regardless, in order to give whatever replacement a fair perf eval
against idr, at least the following 2 bits need to get sorted out:
- the self-induced repeat locking of pidmap_lock
- high cost of kmalloc (to my understanding waiting for sheaves4all)
Powered by blists - more mailing lists