lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHFjqRPao2DOF35rHrYDOAjVC+dcWJ2kGm+7JqnMNk=o2A@mail.gmail.com>
Date: Sun, 23 Nov 2025 17:39:16 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: oleg@...hat.com, brauner@...nel.org, linux-kernel@...r.kernel.org, 
	akpm@...ux-foundation.org, linux-mm@...ck.org
Subject: Re: [PATCH 0/3] further damage-control lack of clone scalability

On Sun, Nov 23, 2025 at 4:00 PM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Sun, Nov 23, 2025 at 07:30:51AM +0100, Mateusz Guzik wrote:
> > When spawning and killing threads in separate processes in parallel the
> > primary bottleneck on the stock kernel is pidmap_lock, largely because
> > of a back-to-back acquire in the common case.
> >
> > Benchmark code at the end.
> >
> > With this patchset alloc_pid() only takes the lock once and consequently
> > alleviates the problem. While scalability improves, the lock remains the
> > primary bottleneck by a large margin.
> >
> > I believe idr is a poor choice for the task at hand to begin with, but
> > sorting out that out beyond the scope of this patchset. At the same time
> > any replacement would be best evaluated against a state where the
> > above relock problem is fixed.
>
> Good news!  The IDR is deprecated.  Bad news!  I'm not 100% sure that
> the XArray is quite appropriate for this usecase.  I am opposed to
> introducing more IDR APIs.  Have you looked at converting to the XArray?
> Or do you have a better data structure in mind than the XArray?
>

I have some recollection we talked about this on irc long time ago.

It is my *suspicion* this would be best served with a sparse bitmap +
a hash table.

Such a solution was already present, but it got replaced by
95846ecf9dac5089 ("pid: replace pid bitmap implementation with IDR
API").

Commit message cites the following bench results:
    The following are the stats for ps, pstree and calling readdir on /proc
    for 10,000 processes.

    ps:
            With IDR API    With bitmap
    real    0m1.479s        0m2.319s
    user    0m0.070s        0m0.060s
    sys     0m0.289s        0m0.516s

    pstree:
            With IDR API    With bitmap
    real    0m1.024s        0m1.794s
    user    0m0.348s        0m0.612s
    sys     0m0.184s        0m0.264s

    proc:
            With IDR API    With bitmap
    real    0m0.059s        0m0.074s
    user    0m0.000s        0m0.004s
    sys     0m0.016s        0m0.016s

Impact on clone was not benchmarked afaics.

It may be a hash table is a bad pick here, but the commit does not
explain what (if anything) was done to try to make it work. As is I
suspect both sizing of the table itself and the hashing algo were
suboptimal for the above test.

I did find a patchset converting the thing to xarray here:
https://lore.kernel.org/all/20221202171620.509140-1-bfoster@redhat.com/
. It looks like it would need a mildly annoying rebase.

The patchset does not come with any numbers for forking either and it
predates other unscrewing I did in the area (which notably moved
tasklist lock away from being the primary bottleneck). If I'm reading
it correctly it comes with lock trips per ns level. This adds overhead
both from single-threaded and multi-threaded standpoint, in particular
still having multiple acquires of the same lock. If it is fine to use
xarray and retain pidmap_lock that's a different story, I have not
looked into it.

Regardless, in order to give whatever replacement a fair perf eval
against idr, at least the following 2 bits need to get sorted out:
- the self-induced repeat locking of pidmap_lock
- high cost of kmalloc (to my understanding waiting for sheaves4all)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ