[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6OucWdMtuuVLizY@google.com>
Date: Wed, 5 Feb 2025 18:31:13 +0000
From: Roman Gushchin <roman.gushchin@...ux.dev>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com>, linux-mm@...ck.org,
Shakeel Butt <shakeel.butt@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, Tejun Heo <tj@...nel.org>,
Michal Koutný <mkoutny@...e.com>,
Michal Hocko <mhocko@...nel.org>,
Muchun Song <muchun.song@...ux.dev>,
Allen Pais <apais@...ux.microsoft.com>,
Yosry Ahmed <yosryahmed@...gle.com>
Subject: Re: A path forward to cleaning up dying cgroups?
On Wed, Feb 05, 2025 at 01:08:42PM -0500, Johannes Weiner wrote:
> On Wed, Feb 05, 2025 at 12:50:19PM -0500, Hamza Mahfooz wrote:
> > Cc: Shakeel Butt <shakeel.butt@...ux.dev>
> >
> > On 2/5/25 12:48, Hamza Mahfooz wrote:
> > > I was just curious as to what the status of the issue described in [1]
> > > is. It appears that the last time someone took a stab at it was in [2].
>
> If memory serves, the sticking point was whether pages should indeed
> be reparented on cgroup death, or whether they could be moved
> arbitrarily to other cgroups that are still using them.
>
> It's a bit unfortunate, because the reparenting patches were tested
> and reviewed, and the arbitrary recharging was just an idea that
> ttbomk nobody seriously followed up on afterwards.
>
> We also recently removed the charge moving code from cgroup1, along
> with the subtle page access/locking/accounting rules it imposed on the
> rest of the MM. I'm doubtful there is much appetite in either camp for
> bringing this back.
>
> So I would still love to see Muchun's patches merged. They fix a
> seemingly universally experienced operational issue in memcg, and we
> shouldn't hold it up unless somebody actually posts alternative code.
>
> Thoughts?
I don't have a strong opinion here. Reparenting is clearly not perfect,
but I agree that we don't have any better solutions, only vague ideas.
I believe Muchun's code would require some refresh, but generally is fine
to merge.
This all comes up to the handling of memory shared between cgroups.
Sharing can be spatial (2 or more simultaneously existing cgroups) or
temporal (a cgroup is being deleted and recreated, the workload tries to
reuse old pages). The reparenting turns temporal sharing into the spacial.
It helps with dying cgroups, but comes at the cost of permanently wrong
accounting and issues with the memory protection.
Thanks!
Powered by blists - more mailing lists