lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkZmjVLN0ih3CJFJto8Zvfeb-4+A_9DJpC+iWzVw-Z9yag@mail.gmail.com>
Date:   Fri, 21 Jul 2023 13:37:51 -0700
From:   Yosry Ahmed <yosryahmed@...gle.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <muchun.song@...ux.dev>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Zefan Li <lizefan.x@...edance.com>,
        Yu Zhao <yuzhao@...gle.com>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Iurii Zaikin <yzaikin@...gle.com>,
        "T.J. Mercier" <tjmercier@...gle.com>,
        Greg Thelen <gthelen@...gle.com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, cgroups@...r.kernel.org
Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs

On Fri, Jul 21, 2023 at 12:18 PM Tejun Heo <tj@...nel.org> wrote:
>
> Hello,
>
> On Fri, Jul 21, 2023 at 11:47:49AM -0700, Yosry Ahmed wrote:
> > On Fri, Jul 21, 2023 at 11:26 AM Tejun Heo <tj@...nel.org> wrote:
> > > On Fri, Jul 21, 2023 at 11:15:21AM -0700, Yosry Ahmed wrote:
> > > > On Thu, Jul 20, 2023 at 3:31 PM Tejun Heo <tj@...nel.org> wrote:
> > > > > memory at least in our case. The sharing across them comes down to things
> > > > > like some common library pages which don't really account for much these
> > > > > days.
> > > >
> > > > Keep in mind that even a single page charged to a memcg and used by
> > > > another memcg is sufficient to result in a zombie memcg.
> > >
> > > I mean, yeah, that's a separate issue or rather a subset which isn't all
> > > that controversial. That can be deterministically solved by reparenting to
> > > the parent like how slab is handled. I think the "deterministic" part is
> > > important here. As you said, even a single page can pin a dying cgroup.
> >
> > There are serious flaws with reparenting that I mentioned above. We do
> > it for kernel memory, but that's because we really have no other
> > choice. Oftentimes the memory is not reclaimable and we cannot find an
> > owner for it. This doesn't mean it's the right answer for user memory.
> >
> > The semantics are new compared to normal charging (as opposed to
> > recharging, as I explain below). There is an extra layer of
> > indirection that we did not (as far as I know) measure the impact of.
> > Parents end up with pages that they never used and we have no
> > observability into where it came from. Most importantly, over time
> > user memory will keep accumulating at the root, reducing the accuracy
> > and usefulness of accounting, effectively an accounting leak and
> > reduction of capacity. Memory that is not attributed to any user, aka
> > system overhead.
>
> That really sounds like the setup is missing cgroup layers tracking
> persistent resources. Most of the problems you describe can be solved by
> adding cgroup layers at the right spots which would usually align with the
> logical structure of the system, right?

It is difficult to track down all persistent/shareable resources and
find the users, especially when both the resources and the users are
dynamically changed. A simple example is text files for a shared
library or sidecar processes that run with different workloads and
need to have their usage charged to the workload, but they may have
memory. For those cases there is no layering that would work. More
practically, sometimes userspace just doesn't even know what exactly
is being shared by whom.

>
> ...
> > I believe recharging is being mis-framed here :)
> >
> > Recharging semantics are not new, it is a shortcut to a process that
> > is already happening that is focused on offline memcgs. Let's take a
> > step back.
>
> Yeah, it does sound better when viewed that way. I'm still not sure what
> extra problems it solves tho. We experienced similar problems but AFAIK all
> of them came down to needing the appropriate hierarchical structure to
> capture how resources are being used on systems.

It solves the problem of zombie memcgs and unaccounted memory. It is
great that in some cases an appropriate hierarchy structure fixes the
problem by accurately capturing how resources are being shared, but in
some cases it's not as straightforward. Recharging attempts to fix the
problem in a way that is more consistent with current semantics and
more appealing that reparenting in terms of rightful ownership.

Some systems are not rebooted for months. Can you imagine how much
memory can be accumulated at the root (escaping all accounting) over
months of reparenting?

>
> Thanks.
>
> --
> tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ