linux-kernel - Re: [PATCH RFC v2 00/10] SLUB percpu sheaves

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ztssad52ikws3a2dwodju4o73h6rsutxnvzj5i6vyjjkudkiel@g7c7g5i3l7jd>
Date: Sat, 22 Feb 2025 19:19:41 -0500
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Suren Baghdasaryan <surenb@...gle.com>, 
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, Christoph Lameter <cl@...ux.com>, 
	David Rientjes <rientjes@...gle.com>, Roman Gushchin <roman.gushchin@...ux.dev>, 
	Hyeonggon Yoo <42.hyeyoo@...il.com>, Uladzislau Rezki <urezki@...il.com>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, rcu@...r.kernel.org, maple-tree@...ts.infradead.org, 
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Alexei Starovoitov <ast@...nel.org>
Subject: Re: [PATCH RFC v2 00/10] SLUB percpu sheaves

On Fri, Feb 14, 2025 at 05:27:36PM +0100, Vlastimil Babka wrote:
> - Cheaper fast paths. For allocations, instead of local double cmpxchg,
>   after Patch 5 it's preempt_disable() and no atomic operations. Same for
>   freeing, which is normally a local double cmpxchg only for a short
>   term allocations (so the same slab is still active on the same cpu when
>   freeing the object) and a more costly locked double cmpxchg otherwise.
>   The downside is the lack of NUMA locality guarantees for the allocated
>   objects.

Is that really cheaper than a local non locked double cmpxchg?

Especially if you now have to use pushf/popf...

> - kfree_rcu() batching and recycling. kfree_rcu() will put objects to a
>   separate percpu sheaf and only submit the whole sheaf to call_rcu()
>   when full. After the grace period, the sheaf can be used for
>   allocations, which is more efficient than freeing and reallocating
>   individual slab objects (even with the batching done by kfree_rcu()
>   implementation itself). In case only some cpus are allowed to handle rcu
>   callbacks, the sheaf can still be made available to other cpus on the
>   same node via the shared barn. The maple_node cache uses kfree_rcu() and
>   thus can benefit from this.

Have you looked at fs/bcachefs/rcu_pending.c?