linux-kernel - Re: [PATCH mm v2] memcg: notify about global mem_cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e93596ef-d4eb-5fcb-744c-9ef99cb5a69e@openvz.org>
Date:   Tue, 28 Jun 2022 06:43:02 +0300
From:   Vasily Averin <vvs@...nvz.org>
To:     Roman Gushchin <roman.gushchin@...ux.dev>
Cc:     Muchun Song <songmuchun@...edance.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Michal Koutný <mkoutny@...e.com>,
        Michal Hocko <mhocko@...e.com>, kernel@...nvz.org,
        LKML <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Cgroups <cgroups@...r.kernel.org>
Subject: Re: [PATCH mm v2] memcg: notify about global mem_cgroup_id space
 depletion

On 6/28/22 04:11, Roman Gushchin wrote:
> On Mon, Jun 27, 2022 at 09:49:18AM +0300, Vasily Averin wrote:
>> On 6/27/22 06:23, Muchun Song wrote:
>>> If the caller can know -ENOSPC is returned by mkdir(), then I
>>> think the user (perhaps systemd) is the best place to throw out the
>>> error message instead of in the kernel log. Right?
>>
>> Such an incident may occur inside the container.
>> OpenVZ nodes can host 300-400 containers, and the host admin cannot
>> monitor guest logs. the dmesg message is necessary to inform the host
>> owner that the global limit has been reached, otherwise he can
>> continue to believe that there are no problems on the node.
> 
> Why this is happening? It's hard to believe someone really needs that
> many cgroups. Is this when somebody fails to delete old cgroups?

I do not have direct claims that some node really reached this limit,
however I saw crashdumps with 30000+ cgroups.

Theoretically OpenVz/LXC nodes can host up to several thousand containers
per node. Practically production nodes with 300-400 containers are a
common thing. I assume that each container can easily use up to 100-200
memory cgroups, and I think this is normal consumption.

Therefore, I believe that 64K limit is quite achievable in real life.
Primary goal of my patch is to confirm this theory.

> I wanted to say that it's better to introduce a memcg event, but then
> I realized it's probably not worth the wasted space. Is this a common
> scenario?
> 
> I think a better approach will be to add a cgroup event (displayed via
> cgroup.events) about reaching the maximum limit of cgroups. E.g.
> cgroups.events::max_nr_reached. Then you can set cgroup.max.descendants
> to some value below memcg_id space size. It's more work, but IMO it's
> a better way to communicate this event. As a bonus, you can easily
> get an idea which cgroup depletes the limit.

For my goal (i.e. just to confirm that 64K limit was reached) this
functionality is too complicated. This confirmation is important because
it should push us to increase the global limit.

However, I think your idea is great, In perspective it helps both OpenVZ
and LXC and possibly Shakeel to understand the real memcg using and set
the proper limit for containers. 

I'm going to prepare such patches, however I'm not sure I'll have enough
time for this task in the near future.

Thank you,
	Vasily Averin