[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250423084306.65706-1-link@vivo.com>
Date: Wed, 23 Apr 2025 16:43:03 +0800
From: Huan Yang <link@...o.com>
To: Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Shakeel Butt <shakeel.butt@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>,
cgroups@...r.kernel.org,
linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Cc: opensource.kernel@...o.com,
Huan Yang <link@...o.com>
Subject: [PATCH 0/2] Use kmem_cache for memcg alloc
mem_cgroup_alloc create mem_cgroup struct and many other struct include
mem_cgroup_per_node.
In our machine - Arm64, 16G, 6.6, 1NUMA, memcgv2 with nokmem,nosocket,
cgroup_disable pressure, run shell like this:
echo 1 > /sys/kernel/tracing/events/kmem/kmalloc/enable
echo 1 > /sys/kernel/tracing/tracing_on
cat /sys/kernel/tracing/trace_pipe | grep kmalloc | grep mem_cgroup
# In other shell terminal
echo +memory > /sys/fs/cgroup/cgroup.subtree_control
Then can see ftrace show may like this:
# mem_cgroup struct alloc in mem_cgroup_alloc
sh-6312 [000] ..... 58015.698365: kmalloc:
call_site=mem_cgroup_css_alloc+0xd8/0x5b4 ptr=000000003e4c3799
bytes_req=2312 bytes_alloc=4096 gfp_flags=GFP_KERNEL|__GFP_ZERO
node=-1 accounted=false
# mem_cgroup_per_node alloc in alloc_mem_cgroup_per_node_info
sh-6312 [000] ..... 58015.698389: kmalloc:
call_site=mem_cgroup_css_alloc+0x1d8/0x5b4 ptr=00000000d798700c
bytes_req=2896 bytes_alloc=4096 gfp_flags=GFP_KERNEL|__GFP_ZERO
node=0 accounted=false
Both two of them use kmalloc to alloc target struct, which size between
2K-4K, but get 4K above aligned slab object_size.
The reason is common kmalloc prepared many pre-defined size slab, in our
machine, like:64 128 192 256 ... 2k 4k 8k.
So, when alloc mem_cgroup or mem_cgroup_per_node, it request 2312/2896 with
between 2K-4K, and no other pre-slub like 3k or more large but low than 4k,
so it use 4K slab object_size.
So, each memcg will waste 1784bytes, and due to 1 node, also waste
1200bytes, sum as 2984bytes per memcg. If multi NUMA node, this also
increase more:
8bytes * nr_node_ids + 1200bytes * nr_node_ids
This is a little waste.
This patchset add two kmem_cache for their struct alloc.
patch1 - mem_cgroup named memcg_cachep
patch2 - mem_cgroup_per_node named memcg_pn_cachep
This benifit can observe by this shell:
echo 1 > /sys/kernel/tracing/events/kmem/kmem_cache_alloc/enable
echo 1 > /sys/kernel/tracing/tracing_on
cat /sys/kernel/tracing/trace_pipe | grep kmem_cache_alloc | grep mem_cgroup
# In other shell terminal
echo +memory > /sys/fs/cgroup/cgroup.subtree_control
Then show may like this:
sh-9827 [000] ..... 289.513598: kmem_cache_alloc:
call_site=mem_cgroup_css_alloc+0xbc/0x5d4 ptr=00000000695c1806
bytes_req=2312 bytes_alloc=2368 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
accounted=false
sh-9827 [000] ..... 289.513602: kmem_cache_alloc:
call_site=mem_cgroup_css_alloc+0x1b8/0x5d4 ptr=000000002989e63a
bytes_req=2896 bytes_alloc=2944 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
accounted=false
That mean mem_cgroup request 2312 bytes, given 2368bytes obj, and
mem_cgroup_per_node request 2896 bytes, given 2944bytes obj. Which due to
each kmem_cache with SLAB_HWCACHE_ALIGN.
And if without SLAB_HWCACHE_ALIGN, may show like this:
sh-9269 [003] ..... 80.396366: kmem_cache_alloc:
call_site=mem_cgroup_css_alloc+0xbc/0x5d4 ptr=000000005b12b475
bytes_req=2312 bytes_alloc=2312 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
accounted=false
sh-9269 [003] ..... 80.396411: kmem_cache_alloc:
call_site=mem_cgroup_css_alloc+0x1b8/0x5d4 ptr=00000000f347adc6
bytes_req=2896 bytes_alloc=2896 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
accounted=false
Which bytes_alloc show as expect, but I guess with HWCACHE ALIGN more worth,
so this patchset use it as default. If anything wrong, please correct me.
So, how many memcg may show in real machine? This is depends on how we group
memory resource for task. In our phone, we choice per-app-memcg, so this is
depends on how many app user installed and openned, May 10 - 100.
And if by uid-pid group, this maybe more, may 400 - 1000 or more.
Huan Yang (2):
mm/memcg: use kmem_cache when alloc memcg
mm/memcg: use kmem_cache when alloc memcg pernode info
mm/memcontrol.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
base-commit: 2c9c612abeb38aab0e87d48496de6fd6daafb00b
--
2.48.1
Powered by blists - more mailing lists