linux-kernel - [PATCH 0/2] Use kmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250423084306.65706-1-link@vivo.com>
Date: Wed, 23 Apr 2025 16:43:03 +0800
From: Huan Yang <link@...o.com>
To: Johannes Weiner <hannes@...xchg.org>,
	Michal Hocko <mhocko@...nel.org>,
	Roman Gushchin <roman.gushchin@...ux.dev>,
	Shakeel Butt <shakeel.butt@...ux.dev>,
	Muchun Song <muchun.song@...ux.dev>,
	Andrew Morton <akpm@...ux-foundation.org>,
	cgroups@...r.kernel.org,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Cc: opensource.kernel@...o.com,
	Huan Yang <link@...o.com>
Subject: [PATCH 0/2] Use kmem_cache for memcg alloc

mem_cgroup_alloc create mem_cgroup struct and many other struct include
mem_cgroup_per_node.

In our machine - Arm64, 16G, 6.6, 1NUMA, memcgv2 with nokmem,nosocket,
cgroup_disable pressure, run shell like this:
  echo 1 > /sys/kernel/tracing/events/kmem/kmalloc/enable
  echo 1 > /sys/kernel/tracing/tracing_on
  cat /sys/kernel/tracing/trace_pipe | grep kmalloc | grep mem_cgroup
  # In other shell terminal
  echo +memory > /sys/fs/cgroup/cgroup.subtree_control
Then can see ftrace show may like this:
  # mem_cgroup struct alloc in mem_cgroup_alloc
  sh-6312    [000] ..... 58015.698365: kmalloc:
    call_site=mem_cgroup_css_alloc+0xd8/0x5b4 ptr=000000003e4c3799
    bytes_req=2312 bytes_alloc=4096 gfp_flags=GFP_KERNEL|__GFP_ZERO
    node=-1 accounted=false

  # mem_cgroup_per_node alloc in alloc_mem_cgroup_per_node_info
  sh-6312    [000] ..... 58015.698389: kmalloc:
    call_site=mem_cgroup_css_alloc+0x1d8/0x5b4 ptr=00000000d798700c
    bytes_req=2896 bytes_alloc=4096 gfp_flags=GFP_KERNEL|__GFP_ZERO
    node=0 accounted=false

Both two of them use kmalloc to alloc target struct, which size between
2K-4K, but get 4K above aligned slab object_size.
The reason is common kmalloc prepared many pre-defined size slab, in our
machine, like:64 128 192 256 ... 2k 4k 8k.

So, when alloc mem_cgroup or mem_cgroup_per_node, it request 2312/2896 with
between 2K-4K, and no other pre-slub like 3k or more large but low than 4k,
so it use 4K slab object_size.

So, each memcg will waste 1784bytes, and due to 1 node, also waste
1200bytes, sum as 2984bytes per memcg. If multi NUMA node, this also
increase more:
  8bytes * nr_node_ids + 1200bytes * nr_node_ids
This is a little waste.

This patchset add two kmem_cache for their struct alloc.
  patch1 - mem_cgroup named memcg_cachep
  patch2 - mem_cgroup_per_node named memcg_pn_cachep

This benifit can observe by this shell:
  echo 1 > /sys/kernel/tracing/events/kmem/kmem_cache_alloc/enable
  echo 1 > /sys/kernel/tracing/tracing_on
  cat /sys/kernel/tracing/trace_pipe | grep kmem_cache_alloc | grep mem_cgroup
  # In other shell terminal
  echo +memory > /sys/fs/cgroup/cgroup.subtree_control
Then show may like this:
  sh-9827    [000] .....   289.513598: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0xbc/0x5d4 ptr=00000000695c1806
    bytes_req=2312 bytes_alloc=2368 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
    accounted=false
  sh-9827    [000] .....   289.513602: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0x1b8/0x5d4 ptr=000000002989e63a
    bytes_req=2896 bytes_alloc=2944 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
    accounted=false
That mean mem_cgroup request 2312 bytes, given 2368bytes obj, and
mem_cgroup_per_node request 2896 bytes, given 2944bytes obj. Which due to
each kmem_cache with SLAB_HWCACHE_ALIGN.
And if without SLAB_HWCACHE_ALIGN, may show like this:
  sh-9269    [003] .....    80.396366: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0xbc/0x5d4 ptr=000000005b12b475
    bytes_req=2312 bytes_alloc=2312 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
    accounted=false
  sh-9269    [003] .....    80.396411: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0x1b8/0x5d4 ptr=00000000f347adc6
    bytes_req=2896 bytes_alloc=2896 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
    accounted=false

Which bytes_alloc show as expect, but I guess with HWCACHE ALIGN more worth,
so this patchset use it as default. If anything wrong, please correct me.

So, how many memcg may show in real machine? This is depends on how we group
memory resource for task. In our phone, we choice per-app-memcg, so this is
depends on how many app user installed and openned, May 10 - 100.
And if by uid-pid group, this maybe more, may 400 - 1000 or more.

Huan Yang (2):
  mm/memcg: use kmem_cache when alloc memcg
  mm/memcg: use kmem_cache when alloc memcg pernode info

 mm/memcontrol.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)


base-commit: 2c9c612abeb38aab0e87d48496de6fd6daafb00b
--
2.48.1