linux-kernel - Re: [PATCH 0/3] further damage-control lack of clone scalability

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aSOAcMSYsQ22kPid@casper.infradead.org>
Date: Sun, 23 Nov 2025 21:45:20 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: oleg@...hat.com, brauner@...nel.org, linux-kernel@...r.kernel.org,
	akpm@...ux-foundation.org, linux-mm@...ck.org
Subject: Re: [PATCH 0/3] further damage-control lack of clone scalability

On Sun, Nov 23, 2025 at 05:39:16PM +0100, Mateusz Guzik wrote:
> I have some recollection we talked about this on irc long time ago.
> 
> It is my *suspicion* this would be best served with a sparse bitmap +
> a hash table.

Maybe!  I've heard other people speculate that would be a better data
structure.  I know we switched away from a hash table for the page
cache, but that has a different usage pattern where it's common to go
from page N to page N+1, N+2, ...  Other than ps, I don't think we often
have that pattern for PIDs.

> Such a solution was already present, but it got replaced by
> 95846ecf9dac5089 ("pid: replace pid bitmap implementation with IDR
> API").
> 
> Commit message cites the following bench results:
>     The following are the stats for ps, pstree and calling readdir on /proc
>     for 10,000 processes.
> 
>     ps:
>             With IDR API    With bitmap
>     real    0m1.479s        0m2.319s
>     user    0m0.070s        0m0.060s
>     sys     0m0.289s        0m0.516s
> 
>     pstree:
>             With IDR API    With bitmap
>     real    0m1.024s        0m1.794s
>     user    0m0.348s        0m0.612s
>     sys     0m0.184s        0m0.264s
> 
>     proc:
>             With IDR API    With bitmap
>     real    0m0.059s        0m0.074s
>     user    0m0.000s        0m0.004s
>     sys     0m0.016s        0m0.016s
> 
> Impact on clone was not benchmarked afaics.

It shouldn't be too much effort for you to check out 95846ecf9dac5089
and 95846ecf9dac5089^ to run your benchmark on both?  That would seem
like the cheapest way of assessing the performance of hash+bitmap
vs IDR.

> Regardless, in order to give whatever replacement a fair perf eval
> against idr, at least the following 2 bits need to get sorted out:
> - the self-induced repeat locking of pidmap_lock
> - high cost of kmalloc (to my understanding waiting for sheaves4all)

The nice thing about XArray (compared to IDR) is that there's no
requirement to preallocate.  Only 1.6% of xa_alloc() calls result in
calling slab.  The downside is that means that XArray needs to know
where its lock is (ie xa_lock) so that it can drop the lock in order to
allocate without using GFP_ATOMIC.

At one point I kind of had a plan to create a multi-xarray where you had
multiple xarrays that shared a single lock.  Or maybe this sharding is
exactly what's needed; I haven't really analysed the pid locking to see
what's needed.