linux-kernel - Re: kernfs memcg accounting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <65244222-02c5-6e8d-7f4b-83651f378ce2@openvz.org>
Date:   Thu, 5 May 2022 00:16:12 +0300
From:   Vasily Averin <vvs@...nvz.org>
To:     Michal Koutný <mkoutny@...e.com>
Cc:     Roman Gushchin <roman.gushchin@...ux.dev>,
        Vlastimil Babka <vbabka@...e.cz>,
        Shakeel Butt <shakeelb@...gle.com>, kernel@...nvz.org,
        Florian Westphal <fw@...len.de>, linux-kernel@...r.kernel.org,
        Michal Hocko <mhocko@...e.com>, cgroups@...r.kernel.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Tejun Heo <tj@...nel.org>
Subject: Re: kernfs memcg accounting

On 5/4/22 17:10, Michal Koutný wrote:
> On Wed, May 04, 2022 at 12:00:18PM +0300, Vasily Averin <vvs@...nvz.org> wrote:
>> As far as I understand, Roman chose the parent memcg because it was a special
>> case of creating a new memory group. He temporally changed active memcg
>> in mem_cgroup_css_alloc() and properly accounted all required memcg-specific
>> allocations.
> 
>> However, he ignored accounting for a rather large struct mem_cgroup
>> therefore I think we can do not worry about 128 bytes of kernfs node.
> 
> Are you referring to the current code (>= v5.18-rc2)? All big structs
> related to mem_cgroup should be accounted. What is ignored?

mm/memcontrol.c:
5079 static struct mem_cgroup *mem_cgroup_alloc(void)
5080 {
5081         struct mem_cgroup *memcg;
...
5086         memcg = kzalloc(struct_size(memcg, nodeinfo, nr_node_ids), GFP_KERNEL);

I think it should allocate at least 2 pages.

>> Primary I mean here struct mem_cgroup allocation in mem_cgroup_alloc().
> 
> Just note that memory controller may not be always enabled so
> cgroup_mkdir != mem_cgroup_alloc().

However if cgroup_mkdir() calls mem_cgroup_alloc() it correctly account huge percpu
allocations but ignores neighbour multipage allocation.

>> However, I think we need to take into account any other distributions called
>> inside cgroup_mkdir: struct cgroup and kernefs node in common part and 
>> any other cgroup-cpecific allocations in other .css_alloc functions.
>> They all can be called from inside container, allocates non-accountable
>> memory and by this way theoretically can be misused.
> 
> Also note that (if you're purely on unified hierachy) you can protect
> against that with cgroup.max.descendants and cgroup.max.depth.

In past OpenVz had a lot of limits for various resources (for example we had a limit 
for iptable rules), but it was very hard to configure them properly.
Finally we  decided to replace all such custom limits by common memory limit.
It isn't important how many resources tries to use container as long as
it doesn't exceed the memory limit.
Such resource limits can be useful, especially to prevent possible misuses.
However sooner or later there will be a legal user who will rest against them.
All you can do in this situation is to recommend him just increase the limit,
that makes the limit senseless.
Memory limits looks much more reasonable and understandable.

Thank you,
	Vasily Averin