[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0c8cc83bb73abf080faf584f319008b67d0931db.camel@linaro.org>
Date: Wed, 30 Jul 2025 13:20:18 +0100
From: André Draszik <andre.draszik@...aro.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
linux-kernel@...r.kernel.org
Cc: André Almeida <andrealmeid@...lia.com>, Darren Hart
<dvhart@...radead.org>, Davidlohr Bueso <dave@...olabs.net>, Ingo Molnar
<mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>, Peter Zijlstra
<peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>, Valentin
Schneider <vschneid@...hat.com>, Waiman Long <longman@...hat.com>, Andrew
Morton <akpm@...ux-foundation.org>, David Hildenbrand <david@...hat.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, Lorenzo Stoakes
<lorenzo.stoakes@...cle.com>, Michal Hocko <mhocko@...e.com>, Mike
Rapoport <rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>, linux-mm@...ck.org
Subject: Re: [PATCH v2 2/6] futex: Use RCU-based per-CPU reference counting
instead of rcuref_t
On Thu, 2025-07-10 at 13:00 +0200, Sebastian Andrzej Siewior wrote:
> From: Peter Zijlstra <peterz@...radead.org>
>
> The use of rcuref_t for reference counting introduces a performance bottleneck
> when accessed concurrently by multiple threads during futex operations.
>
> Replace rcuref_t with special crafted per-CPU reference counters. The
> lifetime logic remains the same.
>
> The newly allocate private hash starts in FR_PERCPU state. In this state, each
> futex operation that requires the private hash uses a per-CPU counter (an
> unsigned int) for incrementing or decrementing the reference count.
>
> When the private hash is about to be replaced, the per-CPU counters are
> migrated to a atomic_t counter mm_struct::futex_atomic.
> The migration process:
> - Waiting for one RCU grace period to ensure all users observe the
> current private hash. This can be skipped if a grace period elapsed
> since the private hash was assigned.
>
> - futex_private_hash::state is set to FR_ATOMIC, forcing all users to
> use mm_struct::futex_atomic for reference counting.
>
> - After a RCU grace period, all users are guaranteed to be using the
> atomic counter. The per-CPU counters can now be summed up and added to
> the atomic_t counter. If the resulting count is zero, the hash can be
> safely replaced. Otherwise, active users still hold a valid reference.
>
> - Once the atomic reference count drops to zero, the next futex
> operation will switch to the new private hash.
>
> call_rcu_hurry() is used to speed up transition which otherwise might be
> delay with RCU_LAZY. There is nothing wrong with using call_rcu(). The
> side effects would be that on auto scaling the new hash is used later
> and the SET_SLOTS prctl() will block longer.
>
> [bigeasy: commit description + mm get/ put_async]
kmemleak complains about a new memleak with this commit:
[ 680.179004][ T101] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
$ cat /sys/kernel/debug/kmemleak
unreferenced object (percpu) 0xc22ec0eface8 (size 4):
comm "swapper/0", pid 1, jiffies 4294893115
hex dump (first 4 bytes on cpu 7):
01 00 00 00 ....
backtrace (crc b8bc6765):
kmemleak_alloc_percpu+0x48/0xb8
pcpu_alloc_noprof+0x6ac/0xb68
futex_mm_init+0x60/0xe0
mm_init+0x1e8/0x3c0
mm_alloc+0x5c/0x78
init_args+0x74/0x4b0
debug_vm_pgtable+0x60/0x2d8
do_one_initcall+0x128/0x3e0
do_initcall_level+0xb4/0xe8
do_initcalls+0x60/0xb0
do_basic_setup+0x28/0x40
kernel_init_freeable+0x158/0x1f8
kernel_init+0x2c/0x1e0
ret_from_fork+0x10/0x20
And futex_mm_init+0x60/0xe0 resolves to
mm->futex_ref = alloc_percpu(unsigned int);
in futex_mm_init().
Reverting this commit (and patches 3 and 4 in this series due to context),
makes kmemleak happy again.
Cheers,
Andre'
Powered by blists - more mailing lists