[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1a1b7ba-c084-4e41-9242-8255f2664b76@efficios.com>
Date: Fri, 28 Nov 2025 08:30:08 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Gabriel Krisman Bertazi <krisman@...e.de>, linux-mm@...ck.org
Cc: linux-kernel@...r.kernel.org, jack@...e.cz,
Mateusz Guzik <mjguzik@...il.com>, Shakeel Butt <shakeel.butt@...ux.dev>,
Michal Hocko <mhocko@...nel.org>, Dennis Zhou <dennis@...nel.org>,
Tejun Heo <tj@...nel.org>, Christoph Lameter <cl@...two.org>,
Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka
<vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RFC PATCH 0/4] Optimize rss_stat initialization/teardown for
single-threaded tasks
On 2025-11-27 18:36, Gabriel Krisman Bertazi wrote:
> The cost of the pcpu memory allocation is non-negligible for systems
> with many cpus, and it is quite visible when forking a new task, as
> reported in a few occasions.
I've come to the same conclusion within the development of
the hierarchical per-cpu counters.
But while the mm_struct has a SLAB cache (initialized in
kernel/fork.c:mm_cache_init()), there is no such thing
for the per-mm per-cpu data.
In the mm_struct, we have the following per-cpu data (please
let me know if I missed any in the maze):
- struct mm_cid __percpu *pcpu_cid (or equivalent through
struct mm_mm_cid after Thomas Gleixner gets his rewrite
upstream),
- unsigned int __percpu *futex_ref,
- NR_MM_COUNTERS rss_stats per-cpu counters.
What would really reduce memory allocation overhead on fork
is to move all those fields into a top level
"struct mm_percpu_struct" as a first step. This would
merge 3 per-cpu allocations into one when forking a new
task.
Then the second step is to create a mm_percpu_struct
cache to bypass the per-cpu allocator.
I suspect that by doing just that we'd get most of the
performance benefits provided by the single-threaded special-case
proposed here.
I'm not against special casing single-threaded if it's still
worth it after doing the underlying data structure layout/caching
changes I'm proposing here, but I think we need to fix the
memory allocation overhead issue first before working around it
with special cases and added complexity.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists