[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aSOAcMSYsQ22kPid@casper.infradead.org>
Date: Sun, 23 Nov 2025 21:45:20 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: oleg@...hat.com, brauner@...nel.org, linux-kernel@...r.kernel.org,
akpm@...ux-foundation.org, linux-mm@...ck.org
Subject: Re: [PATCH 0/3] further damage-control lack of clone scalability
On Sun, Nov 23, 2025 at 05:39:16PM +0100, Mateusz Guzik wrote:
> I have some recollection we talked about this on irc long time ago.
>
> It is my *suspicion* this would be best served with a sparse bitmap +
> a hash table.
Maybe! I've heard other people speculate that would be a better data
structure. I know we switched away from a hash table for the page
cache, but that has a different usage pattern where it's common to go
from page N to page N+1, N+2, ... Other than ps, I don't think we often
have that pattern for PIDs.
> Such a solution was already present, but it got replaced by
> 95846ecf9dac5089 ("pid: replace pid bitmap implementation with IDR
> API").
>
> Commit message cites the following bench results:
> The following are the stats for ps, pstree and calling readdir on /proc
> for 10,000 processes.
>
> ps:
> With IDR API With bitmap
> real 0m1.479s 0m2.319s
> user 0m0.070s 0m0.060s
> sys 0m0.289s 0m0.516s
>
> pstree:
> With IDR API With bitmap
> real 0m1.024s 0m1.794s
> user 0m0.348s 0m0.612s
> sys 0m0.184s 0m0.264s
>
> proc:
> With IDR API With bitmap
> real 0m0.059s 0m0.074s
> user 0m0.000s 0m0.004s
> sys 0m0.016s 0m0.016s
>
> Impact on clone was not benchmarked afaics.
It shouldn't be too much effort for you to check out 95846ecf9dac5089
and 95846ecf9dac5089^ to run your benchmark on both? That would seem
like the cheapest way of assessing the performance of hash+bitmap
vs IDR.
> Regardless, in order to give whatever replacement a fair perf eval
> against idr, at least the following 2 bits need to get sorted out:
> - the self-induced repeat locking of pidmap_lock
> - high cost of kmalloc (to my understanding waiting for sheaves4all)
The nice thing about XArray (compared to IDR) is that there's no
requirement to preallocate. Only 1.6% of xa_alloc() calls result in
calling slab. The downside is that means that XArray needs to know
where its lock is (ie xa_lock) so that it can drop the lock in order to
allocate without using GFP_ATOMIC.
At one point I kind of had a plan to create a multi-xarray where you had
multiple xarrays that shared a single lock. Or maybe this sharding is
exactly what's needed; I haven't really analysed the pid locking to see
what's needed.
Powered by blists - more mailing lists