[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f7c33974-e520-387e-9e2f-1e523bfe1545@gentwo.org>
Date: Tue, 4 Nov 2025 14:11:18 -0800 (PST)
From: "Christoph Lameter (Ampere)" <cl@...two.org>
To: Vlastimil Babka <vbabka@...e.cz>
cc: Andrew Morton <akpm@...ux-foundation.org>,
David Rientjes <rientjes@...gle.com>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Harry Yoo <harry.yoo@...cle.com>, Uladzislau Rezki <urezki@...il.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Suren Baghdasaryan <surenb@...gle.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Alexei Starovoitov <ast@...nel.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev,
bpf@...r.kernel.org, kasan-dev@...glegroups.com,
Alexander Potapenko <glider@...gle.com>, Marco Elver <elver@...gle.com>,
Dmitry Vyukov <dvyukov@...gle.com>
Subject: Re: [PATCH RFC 00/19] slab: replace cpu (partial) slabs with
sheaves
On Thu, 23 Oct 2025, Vlastimil Babka wrote:
> Besides (hopefully) improved performance, this removes the rather
> complicated code related to the lockless fastpaths (using
> this_cpu_try_cmpxchg128/64) and its complications with PREEMPT_RT or
> kmalloc_nolock().
Going back to a strict LIFO scheme for alloc/free removes the following
performance features:
1. Objects are served randomly from a variety of slab pages instead of
serving all available objects from a single slab page and then from the
next. This means that the objects require a larger set of TLB entries to
cover. TLB pressure will increase.
2. The number of partial slabs will increase since the free objects in a
partial page are not used up before moving onto the next. Instead free
objects from random slab pages are used.
Spatial object locality is reduced. Temporal object hotness increases.
> The lockless slab freelist+counters update operation using
> try_cmpxchg128/64 remains and is crucial for freeing remote NUMA objects
> without repeating the "alien" array flushing of SLUB, and to allow
> flushing objects from sheaves to slabs mostly without the node
> list_lock.
Hmm... So potential cache hot objects are lost that way and reused on
another node next. The role of the alien caches in SLAB was to cover that
case and we saw performance regressions without these caches.
The method of freeing still reduces the amount of remote partial slabs
that have to be managed and increases the locality of the objects.
Powered by blists - more mailing lists