[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <rha4tmnnrhncn2ryoml2hbu5hxt3qnbg2rurl6tkssnegrc5wn@isui3jn3cu4h>
Date: Tue, 22 Apr 2025 08:40:14 -0700
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Christian Brauner <brauner@...nel.org>
Cc: Michal Koutný <mkoutny@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>, Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>, Yosry Ahmed <yosry.ahmed@...ux.dev>, Tejun Heo <tj@...nel.org>,
Greg Thelen <gthelen@...gle.com>, linux-mm@...ck.org, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, Meta kernel team <kernel-team@...a.com>
Subject: Re: [PATCH v2] memcg: introduce non-blocking limit setting option
On Tue, Apr 22, 2025 at 11:48:23AM +0200, Christian Brauner wrote:
> On Tue, Apr 22, 2025 at 11:31:23AM +0200, Michal Koutný wrote:
> > On Tue, Apr 22, 2025 at 11:23:17AM +0200, Christian Brauner <brauner@...nel.org> wrote:
> > > As written this isn't restricted to admin processes though, no? So any
> > > unprivileged container can open that file O_NONBLOCK and avoid
> > > synchronous reclaim?
> > >
> > > Which might be fine I have no idea but it's something to explicitly
> > > point out
> >
> > It occurred to me as well but I think this is fine -- changing the
> > limits of a container is (should be) a privileged operation already
> > (ensured by file permissions at opening).
> > IOW, this doesn't allow bypassing the limits to anyone who couldn't have
> > been able to change them already.
>
> Hm, can you explain what you mean by a privileged operation here? If I
> have nested containers with user namespaces with delegated cgroup tress,
> i.e., chowned to them and then some PID 1 or privileged container
> _within the user namespace_ lowers the limit and uses O_NONBLOCK then it
> won't trigger synchronous reclaim. Again, this might all be fine I'm
> just trying to understand.
I think Michal's point is (which I agree with) that if a process has the
privilege to change the limit of a cgroup then it is ok for that process
to use O_NONBLOCK to avoid sync reclaim. This new functionality is not
enabling anyone to bypass their limits.
In your example of PID 1 or privileged container, yes with O_NONBLOCK
the limit updater will not trigger sync reclaim but whoever is running
in that cgroup will eventually hit the sync reclaim in their next charge
request.
Powered by blists - more mailing lists