linux-kernel - Re: [PATCH mm v5 0/9] memcg: accounting for objects allocated by mkdir, cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0f8146e3-5865-b7e6-6728-5baada375cf2@openvz.org>
Date:   Fri, 24 Jun 2022 13:40:14 +0300
From:   Vasily Averin <vvs@...nvz.org>
To:     Shakeel Butt <shakeelb@...gle.com>, Michal Hocko <mhocko@...e.com>
Cc:     kernel@...nvz.org, Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Michal Koutný <mkoutny@...e.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Muchun Song <songmuchun@...edance.com>,
        Cgroups <cgroups@...r.kernel.org>
Subject: Re: [PATCH mm v5 0/9] memcg: accounting for objects allocated by
 mkdir, cgroup

On 6/23/22 19:55, Shakeel Butt wrote:
> On Thu, Jun 23, 2022 at 9:07 AM Michal Hocko <mhocko@...e.com> wrote:
>>
>> On Thu 23-06-22 18:03:31, Vasily Averin wrote:
>>> Dear Michal,
>>> do you still have any concerns about this patch set?
>>
>> Yes, I do not think we have concluded this to be really necessary. IIRC
>> Roman would like to see lingering cgroups addressed in not-so-distant
>> future (http://lkml.kernel.org/r/Ypd2DW7id4M3KJJW@carbon) and we already
>> have a limit for the number of cgroups in the tree. So why should we
>> chase after allocations that correspond the cgroups and somehow try to
>> cap their number via the memory consumption. This looks like something
>> that will get out of sync eventually and it also doesn't seem like the
>> best control to me (comparing to an explicit limit to prevent runaways).
>> --
> 
> Let me give a counter argument to that. On a system running multiple
> workloads, how can the admin come up with a sensible limit for the
> number of cgroups? There will definitely be jobs that require much
> more number of sub-cgroups. Asking the admins to dynamically tune
> another tuneable is just asking for more complications. At the end all
> the users would just set it to max.
> 
> I would recommend to see the commit ac7b79fd190b ("inotify, memcg:
> account inotify instances to kmemcg") where there is already a sysctl
> (inotify/max_user_instances) to limit the number of instances but
> there was no sensible way to set that limit on a multi-tenant system.

I've found that MEM_CGROUP_ID_MAX limits memory cgroups only. Other types
of cgroups do not have similar restrictions. Yes, we can set some per-container 
limit for all cgroups, but to me it looks like workaround while
proper memory accounting looks like real solution.

Btw could you please explain why memory cgroups have MEM_CGROUP_ID_MAX limit
Why it is required at all and why it was set to USHRT_MAX? I believe that
in the future it may be really reachable:

Let's set up per-container cgroup limit to some small numbers, 
for example to 512 as OpenVz doing right now. On real node with 300
containers we can easily get 100*300 = 30000 cgroups, and consume ~3Gb memory, 
without any misuse. I think it is too much to ignore its accounting.

Thank you,
	Vasily Averin