[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <587f5e6b-d543-4028-85c8-93cc8f581d02@suse.cz>
Date: Thu, 9 May 2024 16:25:05 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: David Rientjes <rientjes@...gle.com>, Joonsoo Kim
<iamjoonsoo.kim@....com>, Christoph Lameter <cl@...ux.com>,
Pekka Enberg <penberg@...nel.org>, Andrew Morton
<akpm@...ux-foundation.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>, patches@...ts.linux.dev,
Roman Gushchin <roman.gushchin@...ux.dev>,
Hyeonggon Yoo <42.hyeyoo@...il.com>,
Chengming Zhou <chengming.zhou@...ux.dev>
Subject: [GIT PULL] slab updates for 6.10
Hi Linus,
please pull the latest slab updates from:
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git tags/slab-for-6.10
Sending this early due to upcoming LSF/MM travel and chances there's no rc8.
Thanks,
Vlastimil
======================================
This time it's mostly random cleanups and fixes, with two performance fixes
that might have significant impact, but limited to systems experiencing
particular bad corner case scenarios rather than general performance
improvements.
The memcg hook changes are going through the mm tree due to dependencies.
- Prevent stalls when reading /proc/slabinfo (Jianfeng Wang)
This fixes the long-standing problem that can happen with workloads that have
alloc/free patterns resulting in many partially used slabs (in e.g. dentry
cache). Reading /proc/slabinfo will traverse the long partial slab list under
spinlock with disabled irqs and thus can stall other processes or even
trigger the lockup detection. The traversal is only done to count free
objects so that <active_objs> column can be reported along with <num_objs>.
To avoid affecting fast paths with another shared counter (attempted in the
past) or complex partial list traversal schemes that allow rescheduling, the
chosen solution resorts to approximation - when the partial list is over
10000 slabs long, we will only traverse first 5000 slabs from head and tail
each and use the average of those to estimate the whole list. Both head and
tail are used as the slabs near head to tend to have more free objects than
the slabs towards the tail.
It is expected the approximation should not break existing /proc/slabinfo
consumers. The <num_objs> field is still accurate and reflects the overall
kmem_cache footprint. The <active_objs> was already imprecise due to cpu and
percpu-partial slabs, so can't be relied upon to determine exact cache usage.
The difference between <active_objs> and <num_objs> is mainly useful to
determine the slab fragmentation, and that will be possible even with the
approximation in place.
- Prevent allocating many slabs when a NUMA node is full (Chen Jun)
Currently, on NUMA systems with a node under significantly bigger pressure
than other nodes, the fallback strategy may result in each kmalloc_node()
that can't be safisfied from the preferred node, to allocate a new slab on a
fallback node, and not reuse the slabs already on that node's partial list.
This is now fixed and partial lists of fallback nodes are checked even for
kmalloc_node() allocations. It's still preferred to allocate a new slab on
the requested node before a fallback, but only with a GFP_NOWAIT attempt,
which will fail quickly when the node is under a significant memory pressure.
- More SLAB removal related cleanups (Xiu Jianfeng, Hyunmin Lee)
- Fix slub_kunit self-test with hardened freelists (Guenter Roeck)
- Mark racy accesses for KCSAN (linke li)
- Misc cleanups (Xiongwei Song, Haifeng Xu, Sangyun Kim)
----------------------------------------------------------------
Chen Jun (1):
mm/slub: Reduce memory consumption in extreme scenarios
Guenter Roeck (1):
mm/slub, kunit: Use inverted data to corrupt kmem cache
Haifeng Xu (1):
slub: Set __GFP_COMP in kmem_cache by default
Hyunmin Lee (2):
mm/slub: create kmalloc 96 and 192 caches regardless cache size order
mm/slub: remove the check for NULL kmalloc_caches
Jianfeng Wang (2):
slub: introduce count_partial_free_approx()
slub: use count_partial_free_approx() in slab_out_of_memory()
Sangyun Kim (1):
mm/slub: remove duplicate initialization for early_kmem_cache_node_alloc()
Xiongwei Song (3):
mm/slub: remove the check of !kmem_cache_has_cpu_partial()
mm/slub: add slub_get_cpu_partial() helper
mm/slub: simplify get_partial_node()
Xiu Jianfeng (2):
mm/slub: remove dummy slabinfo functions
mm/slub: correct comment in do_slab_free()
linke li (2):
mm/slub: mark racy accesses on slab->slabs
mm/slub: mark racy access on slab->freelist
lib/slub_kunit.c | 2 +-
mm/slab.h | 3 --
mm/slab_common.c | 27 +++++--------
mm/slub.c | 118 ++++++++++++++++++++++++++++++++++++++++---------------
4 files changed, 96 insertions(+), 54 deletions(-)
Powered by blists - more mailing lists