[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <93497e03-1acf-483e-8695-e103fd1bc044@oracle.com>
Date: Thu, 22 Feb 2024 23:36:01 -0800
From: Jianfeng Wang <jianfeng.w.wang@...cle.com>
To: "Christoph Lameter (Ampere)" <cl@...ux.com>,
Chengming Zhou <chengming.zhou@...ux.dev>
Cc: Vlastimil Babka <vbabka@...e.cz>, David Rientjes <rientjes@...gle.com>,
penberg@...nel.org, iamjoonsoo.kim@....com, akpm@...ux-foundation.org,
roman.gushchin@...ux.dev, 42.hyeyoo@...il.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Chengming Zhou <zhouchengming@...edance.com>
Subject: Re: [PATCH] slub: avoid scanning all partial slabs in get_slabinfo()
On 2/22/24 7:02 PM, Christoph Lameter (Ampere) wrote:
> On Thu, 22 Feb 2024, Chengming Zhou wrote:
>
>> Anyway, I put the code below for discussion...
>
> Can we guestimate the free objects based on the number of partial slabs. That number is available.
>
Yes.
I've thought about calculating the average number of free objects in a
partial slab (through sampling) and then estimating the total number of
free objects as (avg * n->nr_partial).
See the following.
---
mm/slub.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 63d281dfacdb..13385761049c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2963,6 +2963,8 @@ static inline bool free_debug_processing(struct kmem_cache *s,
#endif /* CONFIG_SLUB_DEBUG */
#if defined(CONFIG_SLUB_DEBUG) || defined(SLAB_SUPPORTS_SYSFS)
+#define MAX_PARTIAL_TO_SCAN 10000
+
static unsigned long count_partial(struct kmem_cache_node *n,
int (*get_count)(struct slab *))
{
@@ -2971,8 +2973,22 @@ static unsigned long count_partial(struct kmem_cache_node *n,
struct slab *slab;
spin_lock_irqsave(&n->list_lock, flags);
- list_for_each_entry(slab, &n->partial, slab_list)
- x += get_count(slab);
+ if (n->nr_partial > MAX_PARTIAL_TO_SCAN) {
+ /* Estimate total count of objects via sampling */
+ unsigned long sample_rate = n->nr_partial / MAX_PARTIAL_TO_SCAN;
+ unsigned long scanned = 0;
+ unsigned long counted = 0;
+ list_for_each_entry(slab, &n->partial, slab_list) {
+ if (++scanned % sample_rate == 0) {
+ x += get_count(slab);
+ counted++;
+ }
+ }
+ x = mult_frac(x, n->nr_partial, counted);
+ } else {
+ list_for_each_entry(slab, &n->partial, slab_list)
+ x += get_count(slab);
+ }
spin_unlock_irqrestore(&n->list_lock, flags);
return x;
}
--
> How accurate need the accounting be? We also have fuzzy accounting in the VM counters.
Based on my experience, for a |kmem_cache|, the total number of objects can tell
whether the |kmem_cache| has been heavily used by a workload. When the total
number is large: if the number of free objects is small, then either these objects
are really in-use or there is *memory leak* going on (which then must be further
diagnosed). However, if the number of free objects is large, we can only know
the slab memory fragmentation happens.
So, I think the object accounting needn't be accurate. We only have to tell
whether a large percentage of slab objects is free or not. The above code is a
sampling, which should do the job if we take enough samples.
Powered by blists - more mailing lists