linux-kernel - Re: [PATCH] slub: limit number of slabs to scan in count

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <38ef26aa-169b-48ad-81ad-8378e7a38f25@suse.cz>
Date: Fri, 12 Apr 2024 09:48:40 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: "Christoph Lameter (Ampere)" <cl@...ux.com>,
 Jianfeng Wang <jianfeng.w.wang@...cle.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, penberg@...nel.org,
 rientjes@...gle.com, iamjoonsoo.kim@....com, akpm@...ux-foundation.org,
 junxiao.bi@...cle.com
Subject: Re: [PATCH] slub: limit number of slabs to scan in count_partial()

On 4/11/24 7:02 PM, Christoph Lameter (Ampere) wrote:
> On Thu, 11 Apr 2024, Jianfeng Wang wrote:
> 
>> So, the fix is to limit the number of slabs to scan in
>> count_partial(), and output an approximated result if the list is too
>> long. Default to 10000 which should be enough for most sane cases.
> 
> 
> That is a creative approach. The problem though is that objects on the 
> partial lists are kind of sorted. The partial slabs with only a few 
> objects available are at the start of the list so that allocations cause 
> them to be removed from the partial list fast. Full slabs do not need to 
> be tracked on any list.
> 
> The partial slabs with few objects are put at the end of the partial list 
> in the hope that the few objects remaining will also be freed which would 
> allow the freeing of the slab folio.
> 
> So the object density may be higher at the beginning of the list.
> 
> kmem_cache_shrink() will explicitly sort the partial lists to put the 
> partial pages in that order.
> 
> Can you run some tests showing the difference between the estimation and 
> the real count?

Maybe we could also get a more accurate picture by counting N slabs from the
head and N from the tail and approximating from both. Also not perfect, but
could be able to answer the question if the kmem_cache is significantly
fragmented. Which is probably the only information we can get from the
slabinfo <active_objs> vs <num_objs>. IIRC the latter is always accurate,
the former never because of cpu slabs, so we never know how many objects are
exactly in use. By comparing both we can get an idea of the fragmentation,
and if this change won't make that estimate significantly worse, it should
be acceptable.