[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1f19b775-d670-40ef-9147-2dcdce62b56e@kernel.org>
Date: Fri, 28 Nov 2025 13:38:34 +0100
From: Daniel Gomez <da.gomez@...nel.org>
To: Harry Yoo <harry.yoo@...cle.com>, surenb@...gle.com
Cc: Liam.Howlett@...cle.com, atomlin@...mlin.com, bpf@...r.kernel.org,
cl@...two.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-modules@...r.kernel.org, lucas.demarchi@...el.com,
maple-tree@...ts.infradead.org, mcgrof@...nel.org, petr.pavlu@...e.com,
rcu@...r.kernel.org, rientjes@...gle.com, roman.gushchin@...ux.dev,
samitolvanen@...gle.com, sidhartha.kumar@...cle.com, urezki@...il.com,
vbabka@...e.cz, jonathanh@...dia.com
Subject: Re: [PATCH V1] mm/slab: introduce kvfree_rcu_barrier_on_cache() for
cache destruction
On 28/11/2025 12.37, Harry Yoo wrote:
> Currently, kvfree_rcu_barrier() flushes RCU sheaves across all slab
> caches when a cache is destroyed. This is unnecessary when destroying
> a slab cache; only the RCU sheaves belonging to the cache being destroyed
> need to be flushed.
>
> As suggested by Vlastimil Babka, introduce a weaker form of
> kvfree_rcu_barrier() that operates on a specific slab cache and call it
> on cache destruction.
>
> The performance benefit is evaluated on a 12 core 24 threads AMD Ryzen
> 5900X machine (1 socket), by loading slub_kunit module.
>
> Before:
> Total calls: 19
> Average latency (us): 8529
> Total time (us): 162069
>
> After:
> Total calls: 19
> Average latency (us): 3804
> Total time (us): 72287
>
> Link: https://lore.kernel.org/linux-mm/0406562e-2066-4cf8-9902-b2b0616dd742@kernel.org
> Link: https://lore.kernel.org/linux-mm/e988eff6-1287-425e-a06c-805af5bbf262@nvidia.com
> Link: https://lore.kernel.org/linux-mm/1bda09da-93be-4737-aef0-d47f8c5c9301@suse.cz
> Suggested-by: Vlastimil Babka <vbabka@...e.cz>
> Signed-off-by: Harry Yoo <harry.yoo@...cle.com>
> ---
Thanks Harry for the patch,
A quick test on a different machine from the one I originally used to report
this shows a decrease from 214s to 100s.
LGTM,
Tested-by: Daniel Gomez <da.gomez@...sung.com>
>
> Not sure if the regression is worse on the reporters' machines due to
> higher core count (or because some cores were busy doing other things,
> dunno).
FWIW, CI modules run on an 8 core VM. Depending on the host CPU, this made the
absolute number different but equivalent performance degradation.
>
> Hopefully this will reduce the time to complete tests,
> and Suren could add his patch on top of this ;)
>
> include/linux/slab.h | 5 ++++
> mm/slab.h | 1 +
> mm/slab_common.c | 52 +++++++++++++++++++++++++++++------------
> mm/slub.c | 55 ++++++++++++++++++++++++--------------------
> 4 files changed, 73 insertions(+), 40 deletions(-)
Powered by blists - more mailing lists