[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230721204408.GA1033322@cmpxchg.org>
Date: Fri, 21 Jul 2023 16:44:08 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Tejun Heo <tj@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Shakeel Butt <shakeelb@...gle.com>,
Muchun Song <muchun.song@...ux.dev>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Zefan Li <lizefan.x@...edance.com>,
Yu Zhao <yuzhao@...gle.com>,
Luis Chamberlain <mcgrof@...nel.org>,
Kees Cook <keescook@...omium.org>,
Iurii Zaikin <yzaikin@...gle.com>,
"T.J. Mercier" <tjmercier@...gle.com>,
Greg Thelen <gthelen@...gle.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, cgroups@...r.kernel.org
Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs
On Fri, Jul 21, 2023 at 11:47:49AM -0700, Yosry Ahmed wrote:
> On Fri, Jul 21, 2023 at 11:26 AM Tejun Heo <tj@...nel.org> wrote:
> >
> > Hello,
> >
> > On Fri, Jul 21, 2023 at 11:15:21AM -0700, Yosry Ahmed wrote:
> > > On Thu, Jul 20, 2023 at 3:31 PM Tejun Heo <tj@...nel.org> wrote:
> > > > memory at least in our case. The sharing across them comes down to things
> > > > like some common library pages which don't really account for much these
> > > > days.
> > >
> > > Keep in mind that even a single page charged to a memcg and used by
> > > another memcg is sufficient to result in a zombie memcg.
> >
> > I mean, yeah, that's a separate issue or rather a subset which isn't all
> > that controversial. That can be deterministically solved by reparenting to
> > the parent like how slab is handled. I think the "deterministic" part is
> > important here. As you said, even a single page can pin a dying cgroup.
>
> There are serious flaws with reparenting that I mentioned above. We do
> it for kernel memory, but that's because we really have no other
> choice. Oftentimes the memory is not reclaimable and we cannot find an
> owner for it. This doesn't mean it's the right answer for user memory.
>
> The semantics are new compared to normal charging (as opposed to
> recharging, as I explain below). There is an extra layer of
> indirection that we did not (as far as I know) measure the impact of.
> Parents end up with pages that they never used and we have no
> observability into where it came from. Most importantly, over time
> user memory will keep accumulating at the root, reducing the accuracy
> and usefulness of accounting, effectively an accounting leak and
> reduction of capacity. Memory that is not attributed to any user, aka
> system overhead.
Reparenting has been the behavior since the first iteration of cgroups
in the kernel. The initial implementation would loop over the LRUs and
reparent pages synchronously during rmdir. This had some locking
issues, so we switched to the current implementation of just leaving
the zombie memcg behind but neutralizing its controls.
Thanks to Roman's objcg abstraction, we can now go back to the old
implementation of directly moving pages up to avoid the zombies.
However, these were pure implementation changes. The user-visible
semantics never varied: when you delete a cgroup, any leftover
resources are subject to control by the remaining parent cgroups.
Don't remove control domains if you still need to control resources.
But none of this is new or would change in any way! Neutralizing
controls of a zombie cgroup results in the same behavior and
accounting as linking the pages to the parent cgroup's LRU!
The only thing that's new is the zombie cgroups. We can fix that by
effectively going back to the earlier implementation, but thanks to
objcg without the locking problems.
I just wanted to address this, because your description/framing of
reparenting strikes me as quite wrong.
Powered by blists - more mailing lists