linux-kernel - Re: [RFC PATCH 0/8] memory recharging for offline memcgs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZLm1ptOYH6F8fGHT@slm.duckdns.org>
Date:   Thu, 20 Jul 2023 12:31:02 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Yosry Ahmed <yosryahmed@...gle.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <muchun.song@...ux.dev>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Zefan Li <lizefan.x@...edance.com>,
        Yu Zhao <yuzhao@...gle.com>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Iurii Zaikin <yzaikin@...gle.com>,
        "T.J. Mercier" <tjmercier@...gle.com>,
        Greg Thelen <gthelen@...gle.com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, cgroups@...r.kernel.org
Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs

Hello,

On Thu, Jul 20, 2023 at 03:23:59PM -0700, Yosry Ahmed wrote:
> > On its own, AFAICS, I'm not sure the scope of problems it can actually solve
> > is justifiably greater than what can be achieved with simple nesting.
> 
> In our use case nesting is not a viable option. As I said, in a large
> fleet where a lot of different workloads are dynamically being
> scheduled on different machines, and where there is no way of knowing
> what resources are being shared among what workloads, and even if we
> do, it wouldn't be constant, it's very difficult to construct the
> hierarchy with nesting to keep the resources confined.

Hmm... so, usually, the problems we see are resources that are persistent
across different instances of the same application as they may want to share
large chunks of memory like on-memory cache. I get that machines get
different dynamic jobs but unrelated jobs usually don't share huge amount of
memory at least in our case. The sharing across them comes down to things
like some common library pages which don't really account for much these
days.

> Keep in mind that the environment is dynamic, workloads are constantly
> coming and going. Even if find the perfect nesting to appropriately
> scope resources, some rescheduling may render the hierarchy obsolete
> and require us to start over.

Can you please go into more details on how much memory is shared for what
across unrelated dynamic workloads? That sounds different from other use
cases.

Thanks.

-- 
tejun