linux-kernel - Re: [PATCH RFC 00/19] slab: replace cpu (partial) slabs with sheaves

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3b7b610d-6482-49f0-8e46-6ae553bf8b98@suse.cz>
Date: Mon, 12 Jan 2026 11:55:08 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: "Christoph Lameter (Ampere)" <cl@...two.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
 David Rientjes <rientjes@...gle.com>,
 Roman Gushchin <roman.gushchin@...ux.dev>, Harry Yoo <harry.yoo@...cle.com>,
 Uladzislau Rezki <urezki@...il.com>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>,
 Suren Baghdasaryan <surenb@...gle.com>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 Alexei Starovoitov <ast@...nel.org>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev,
 bpf@...r.kernel.org, kasan-dev@...glegroups.com,
 Alexander Potapenko <glider@...gle.com>, Marco Elver <elver@...gle.com>,
 Dmitry Vyukov <dvyukov@...gle.com>, Mike Rapoport <rppt@...nel.org>
Subject: Re: [PATCH RFC 00/19] slab: replace cpu (partial) slabs with sheaves

On 11/4/25 23:11, Christoph Lameter (Ampere) wrote:
> On Thu, 23 Oct 2025, Vlastimil Babka wrote:
> 
>> Besides (hopefully) improved performance, this removes the rather
>> complicated code related to the lockless fastpaths (using
>> this_cpu_try_cmpxchg128/64) and its complications with PREEMPT_RT or
>> kmalloc_nolock().

Sorry for the late reply and thanks for the insights, I will incorporate
them to the cover letter.

> Going back to a strict LIFO scheme for alloc/free removes the following
> performance features:
> 
> 1. Objects are served randomly from a variety of slab pages instead of
> serving all available objects from a single slab page and then from the
> next. This means that the objects require a larger set of TLB entries to
> cover. TLB pressure will increase.

OK. Should be mitigated by the huge direct mappings hopefully. Also IIRC
when Mike was evaluating patches to preserve the huge mappings better
against splitting, the benefits were so low it was abandoned, so that
suggests the TLB pressure on direct map isn't that bad.

> 2. The number of partial slabs will increase since the free objects in a
> partial page are not used up before moving onto the next. Instead free
> objects from random slab pages are used.

Agreed. Should be bounded by the number of cpu+barn sheaves.

> Spatial object locality is reduced. Temporal object hotness increases.

Ack.

>> The lockless slab freelist+counters update operation using
>> try_cmpxchg128/64 remains and is crucial for freeing remote NUMA objects
>> without repeating the "alien" array flushing of SLUB, and to allow
>> flushing objects from sheaves to slabs mostly without the node
>> list_lock.
> 
> Hmm... So potential cache hot objects are lost that way and reused on
> another node next. The role of the alien caches in SLAB was to cover that
> case and we saw performance regressions without these caches.

Interesting observation. I think commit e00946fe2351 ("[PATCH] slab: Bypass
free lists for __drain_alien_cache()") is relevant?

But I wonder, wouldn't the objects tend to be cache hot on the cpu which was
freeing them (and to which they were remote), but after that alien->shared
array transfer then reallocated on a different cpu (to which they are
local)? So I wouldn't expect cache hotness benefits there?

> The method of freeing still reduces the amount of remote partial slabs
> that have to be managed and increases the locality of the objects.

Ack.