linux-kernel - Re: [PATCH 3/4] memcg: enable accounting for struct cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d28233ee-bccb-7bc3-c2ec-461fd7f95e6a@openvz.org>
Date:   Fri, 20 May 2022 23:16:32 +0300
From:   Vasily Averin <vvs@...nvz.org>
To:     Michal Koutný <mkoutny@...e.com>
Cc:     Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>, kernel@...nvz.org,
        linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...e.com>, cgroups@...r.kernel.org
Subject: Re: [PATCH 3/4] memcg: enable accounting for struct cgroup

On 5/20/22 10:24, Vasily Averin wrote:
> On 5/19/22 19:53, Michal Koutný wrote:
>> On Fri, May 13, 2022 at 06:52:12PM +0300, Vasily Averin <vvs@...nvz.org> wrote:
>>> Creating each new cgroup allocates 4Kb for struct cgroup. This is the
>>> largest memory allocation in this scenario and is epecially important
>>> for small VMs with 1-2 CPUs.
>>
>> What do you mean by this argument?
>>
>> (On bigger irons, the percpu components becomes dominant, e.g. struct
>> cgroup_rstat_cpu.)
> 
> Michal, Shakeel,
> thank you very much for your feedback, it helps me understand how to improve
> the methodology of my accounting analyze.
> I considered the general case and looked for places of maximum memory allocations.
> Now I think it would be better to split all called allocations into:
> - common part, called for any cgroup type (i.e. cgroup_mkdir and cgroup_create),
> - per-cgroup parts,
> and focus on 2 corner cases: for single CPU VMs and for "big irons".
> It helps to clarify which allocations are accounting-important and which ones
> can be safely ignored.
> 
> So right now I'm going to redo the calculations and hope it doesn't take long.

common part: 	~11Kb	+  318 bytes percpu
memcg: 		~17Kb	+ 4692 bytes percpu
cpu:		~2.5Kb	+ 1036 bytes percpu
cpuset:		~3Kb	+   12 bytes percpu
blkcg:		~3Kb	+   12 bytes percpu
pid:		~1.5Kb	+   12 bytes percpu		
perf:		 ~320b	+   60 bytes percpu
-------------------------------------------
total:		~38Kb	+ 6142 bytes percpu
currently accounted:	  4668 bytes percpu

Results:
a) I'll add accounting for cgroup_rstat_cpu and psi_group_cpu,
they are called in common part and consumes 288 bytes percpu.
b) It makes sense to add accounting for simple_xattr(), as Michal recommend,
 especially because it can grow over 4kb
c) it looks like the rest of the allocations can be ignored

Details are below
('=' -- already accounted, '+' -- to be accounted, '~' -- see KERNFS, '?' -- perhaps later )

common part:
16  ~   352     5632    5632    KERNFS (*)
1   +   4096    4096    9728    (cgroup_mkdir+0xe4)
1       584     584     10312   (radix_tree_node_alloc.constprop.0+0x89)
1       192     192     10504   (__d_alloc+0x29)
2       72      144     10648   (avc_alloc_node+0x27)
2       64      128     10776   (percpu_ref_init+0x6a)
1       64      64      10840   (memcg_list_lru_alloc+0x21a)

1   +   192     192     192     call_site=psi_cgroup_alloc+0x1e
1   +   96      96      288     call_site=cgroup_rstat_init+0x5f
2       12      24      312     call_site=percpu_ref_init+0x23
1       6       6       318     call_site=__percpu_counter_init+0x22

(*) KERNFS includes:
1   +  128      (__kernfs_new_node+0x4d)	kernfs node
1   +   88      (__kernfs_iattrs+0x57)		kernfs iattrs
1   +   96      (simple_xattr_alloc+0x28)	simple_xattr_alloc() that can grow over 4Kb 
1   ?   32      (simple_xattr_set+0x59)
1       8       (__kernfs_new_node+0x30)


memory:
------
1   +   8192    8192    8192    (mem_cgroup_css_alloc+0x4a)
14  ~   352     4928    13120   KERNFS
1   +   2048    2048    15168   (mem_cgroup_css_alloc+0xdd)
1       1024    1024    16192   (alloc_shrinker_info+0x79)
1       584     584     16776   (radix_tree_node_alloc.constprop.0+0x89)
2       64      128     16904   (percpu_ref_init+0x6a)
1       64      64      16968   (mem_cgroup_css_online+0x32)

1   =   3684    3684    3684    call_site=mem_cgroup_css_alloc+0x9e
1   =   984     984     4668    call_site=mem_cgroup_css_alloc+0xfd
2       12      24      4692    call_site=percpu_ref_init+0x23

cpu:
---
5   ~   352     1760    1760    KERNFS
1       640     640     2400    (sched_create_group+0x1b)
1       64      64      2464    (percpu_ref_init+0x6a)
1       32      32      2496    (alloc_fair_sched_group+0x55)
1       32      32      2528    (alloc_fair_sched_group+0x31)

4   +   512     512      512    (alloc_fair_sched_group+0x16c)
4   +   512     512     1024    (alloc_fair_sched_group+0x13e)
1       12      12      1036    call_site=percpu_ref_init+0x23

cpuset:
------
5   ~   352     1760    1760    KERNFS
1       1024    1024    2784    (cpuset_css_alloc+0x2f)
1       64      64      2848    (percpu_ref_init+0x6a)
3       8       24      2872    (alloc_cpumask_var_node+0x1f)

1       12      12      12      call_site=percpu_ref_init+0x23

blkcg:
-----
6   ~   352     2112    2112    KERNFS
1       512     512     2624    (blkcg_css_alloc+0x37)
1       64      64      2688    (percpu_ref_init+0x6a)
1       32      32      2720    (ioprio_alloc_cpd+0x39)
1       32      32      2752    (ioc_cpd_alloc+0x39)
1       32      32      2784    (blkcg_css_alloc+0x66)

1       12      12      12      call_site=percpu_ref_init+0x23

pid:
---
3   ~   352     1056    1056    KERNFS
1       512     512     1568    (pids_css_alloc+0x1b)
1       64      64      1632    (percpu_ref_init+0x6a)

1       12      12      12      call_site=percpu_ref_init+0x23

perf:
----
1       256     256     256     (perf_cgroup_css_alloc+0x1c)
1       64      64      320     (percpu_ref_init+0x6a)

1       48      48      48      call_site=perf_cgroup_css_alloc+0x33
1       12      12      60      call_site=percpu_ref_init+0x23

Thank you,
	Vasily Averin