linux-kernel - Re: [PATCH] memcg: add hierarchical effective limits for v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABdmKX2SnGytEHOr7-2V11hczZALy0joOD8uMDBqJZWviXPTTw@mail.gmail.com>
Date: Thu, 6 Feb 2025 11:37:31 -0800
From: "T.J. Mercier" <tjmercier@...gle.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Michal Koutný <mkoutny@...e.com>, 
	Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Muchun Song <muchun.song@...ux.dev>, linux-mm@...ck.org, 
	cgroups@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Meta kernel team <kernel-team@...a.com>
Subject: Re: [PATCH] memcg: add hierarchical effective limits for v2

On Thu, Feb 6, 2025 at 11:09 AM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
>
> On Thu, Feb 06, 2025 at 04:57:39PM +0100, Michal Koutný wrote:
> > Hello Shakeel.
> >
> > On Wed, Feb 05, 2025 at 02:20:29PM -0800, Shakeel Butt <shakeel.butt@...ux.dev> wrote:
> > > Memcg-v1 exposes hierarchical_[memory|memsw]_limit counters in its
> > > memory.stat file which applications can use to get their effective limit
> > > which is the minimum of limits of itself and all of its ancestors.
> >
> > I was fan of equal idea too [1]. The referenced series also tackles
> > change notifications (to make this complete for apps that really want to
> > scale based on the actual limit). I ceased to like it when I realized
> > there can be hierarchies when the effective value cannot be effectively
> > :) determined [2].
> >
> > > This is pretty useful in environments where cgroup namespace is used
> > > and the application does not have access to the full view of the
> > > cgroup hierarchy. Let's expose effective limits for memcg v2 as well.
> >
> > Also, the case for this exposition was never strongly built.
> > Why isn't PSI enough in your case?
> >
>
> Hi Michal,
>
> Oh I totally forgot about your series. In my use-case, it is not about
> dynamically knowning how much they can expand and adjust themselves but
> rather knowing statically upfront what resources they have been given.
> More concretely, these are workloads which used to completely occupy a
> single machine, though within containers but without limits. These
> workloads used to look at machine level metrics at startup on how much
> resources are available.
>
> Now these workloads are being moved to multi-tenant environment but
> still the machine is partitioned statically between the workloads. So,
> these workloads need to know upfront how much resources are allocated to
> them upfront and the way the cgroup hierarchy is setup, that information
> is a bit above the tree.
>
> I hope this clarifies the motivation behind this change i.e. the target
> is not dynamic load balancing but rather upfront static knowledge.
>
> thanks,
> Shakeel
>

We've been thinking of using memcg to both protect (memory.min) and
limit (via memcg OOM) memory hungry apps (games), while informing such
apps of their upper limit so they know how much they can allocate
before risking being killed. Visibility of the cgroup hierarchy isn't
an issue, but having a single file to read instead of walking up the
tree with multiple reads to calculate an effective limit would be
nice. Partial memcg activation in the hierarchy *is* an issue, but
walking up to the closest ancestor with memcg activated is better than
reading all the way up.