[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aozlag7qiwbdezzjgw3bq73ihnkeppmc5iy4hq7zosg3zyalih@ieo3a4qecfxg>
Date: Fri, 30 Jan 2026 00:06:54 +0800
From: Hao Li <hao.li@...ux.dev>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Harry Yoo <harry.yoo@...cle.com>, Petr Tesarik <ptesarik@...e.com>,
Christoph Lameter <cl@...two.org>, David Rientjes <rientjes@...gle.com>,
Roman Gushchin <roman.gushchin@...ux.dev>, Andrew Morton <akpm@...ux-foundation.org>,
Uladzislau Rezki <urezki@...il.com>, "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Suren Baghdasaryan <surenb@...gle.com>, Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Alexei Starovoitov <ast@...nel.org>, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
linux-rt-devel@...ts.linux.dev, bpf@...r.kernel.org, kasan-dev@...glegroups.com,
kernel test robot <oliver.sang@...el.com>, stable@...r.kernel.org, "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH v4 00/22] slab: replace cpu (partial) slabs with sheaves
On Thu, Jan 29, 2026 at 04:28:01PM +0100, Vlastimil Babka wrote:
> On 1/29/26 16:18, Hao Li wrote:
> > Hi Vlastimil,
> >
> > I conducted a detailed performance evaluation of the each patch on my setup.
>
> Thanks! What was the benchmark(s) used?
I'm currently using the mmap2 test case from will-it-scale. The machine is still
an AMD 2-socket system, with 2 nodes per socket, totaling 192 CPUs, with SMT
disabled. For each test run, I used 64, 128, and 192 processes respectively.
> Importantly, does it rely on vma/maple_node objects?
Yes, this test primarily puts a lot of pressure on maple_node.
> So previously those would become kind of double
> cached by both sheaves and cpu (partial) slabs (and thus hopefully benefited
> more than they should) since sheaves introduction in 6.18, and now they are
> not double cached anymore?
Exactly, since version 6.18, maple_node has indeed benefited from a dual-layer
cache.
I did wonder if this isn't a performance regression but rather the
performance returning to its baseline after removing one layer of caching.
However, verifying this idea would require completely disabling the sheaf
mechanism on version 6.19-rc5 while leaving the rest of the SLUB code untouched.
It would be great to hear any suggestions on how this might be approached.
>
> > During my tests, I observed two points in the series where performance
> > regressions occurred:
> >
> > Patch 10: I noticed a ~16% regression in my environment. My hypothesis is
> > that with this patch, the allocation fast path bypasses the percpu partial
> > list, leading to increased contention on the node list.
>
> That makes sense.
>
> > Patch 12: This patch seems to introduce an additional ~9.7% regression. I
> > suspect this might be because the free path also loses buffering from the
> > percpu partial list, further exacerbating node list contention.
>
> Hmm yeah... we did put the previously full slabs there, avoiding the lock.
>
> > These are the only two patches in the series where I observed noticeable
> > regressions. The rest of the patches did not show significant performance
> > changes in my tests.
> >
> > I hope these test results are helpful.
>
> They are, thanks. I'd however hope it's just some particular test that has
> these regressions,
Yes, I hope so too. And the mmap2 test case is indeed quite extreme.
> which can be explained by the loss of double caching.
If we could compare it with a version that only uses the
CPU partial list, the answer might become clearer.
Powered by blists - more mailing lists