linux-kernel - Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZXeTb_ACou7TEVsa@google.com>
Date:   Mon, 11 Dec 2023 14:55:43 -0800
From:   Minchan Kim <minchan@...nel.org>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Chris Li <chrisl@...nel.org>, Nhat Pham <nphamcs@...il.com>,
        akpm@...ux-foundation.org, tj@...nel.org, lizefan.x@...edance.com,
        cerasuolodomenico@...il.com, yosryahmed@...gle.com,
        sjenning@...hat.com, ddstreet@...e.org, vitaly.wool@...sulko.com,
        mhocko@...nel.org, roman.gushchin@...ux.dev, shakeelb@...gle.com,
        muchun.song@...ux.dev, hughd@...gle.com, corbet@....net,
        konrad.wilk@...cle.com, senozhatsky@...omium.org, rppt@...nel.org,
        linux-mm@...ck.org, kernel-team@...a.com,
        linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
        david@...t.cz, Kairui Song <kasong@...cent.com>,
        Zhongkun He <hezhongkun.hzk@...edance.com>
Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling

On Fri, Dec 08, 2023 at 10:42:29PM -0500, Johannes Weiner wrote:
> On Fri, Dec 08, 2023 at 03:55:59PM -0800, Chris Li wrote:
> > I can give you three usage cases right now:
> > 1) Google producting kernel uses SSD only swap, it is currently on
> > pilot. This is not expressible by the memory.zswap.writeback. You can
> > set the memory.zswap.max = 0 and memory.zswap.writeback = 1, then SSD
> > backed swapfile. But the whole thing feels very clunky, especially
> > what you really want is SSD only swap, you need to do all this zswap
> > config dance. Google has an internal memory.swapfile feature
> > implemented per cgroup swap file type by "zswap only", "real swap file
> > only", "both", "none" (the exact keyword might be different). running
> > in the production for almost 10 years. The need for more than zswap
> > type of per cgroup control is really there.
> 
> We use regular swap on SSD without zswap just fine. Of course it's
> expressible.
> 
> On dedicated systems, zswap is disabled in sysfs. On shared hosts
> where it's determined based on which workload is scheduled, zswap is
> generally enabled through sysfs, and individual cgroup access is
> controlled via memory.zswap.max - which is what this knob is for.
> 
> This is analogous to enabling swap globally, and then opting
> individual cgroups in and out with memory.swap.max.
> 
> So this usecase is very much already supported, and it's expressed in
> a way that's pretty natural for how cgroups express access and lack of
> access to certain resources.
> 
> I don't see how memory.swap.type or memory.swap.tiers would improve
> this in any way. On the contrary, it would overlap and conflict with
> existing controls to manage swap and zswap on a per-cgroup basis.
> 
> > 2) As indicated by this discussion, Tencent has a usage case for SSD
> > and hard disk swap as overflow.
> > https://lore.kernel.org/linux-mm/20231119194740.94101-9-ryncsn@gmail.com/
> > +Kairui
> 
> Multiple swap devices for round robin or with different priorities
> aren't new, they have been supported for a very, very long time. So
> far nobody has proposed to control the exact behavior on a per-cgroup
> basis, and I didn't see anybody in this thread asking for it either.
> 
> So I don't see how this counts as an obvious and automatic usecase for
> memory.swap.tiers.
> 
> > 3) Android has some fancy swap ideas led by those patches.
> > https://lore.kernel.org/linux-mm/20230710221659.2473460-1-minchan@kernel.org/
> > It got shot down due to removal of frontswap. But the usage case and
> > product requirement is there.
> > +Minchan
> 
> This looks like an optimization for zram to bypass the block layer and
> hook directly into the swap code. Correct me if I'm wrong, but this
> doesn't appear to have anything to do with per-cgroup backend control.

Hi Johannes,

I haven't been following the thread closely, but I noticed the discussion
about potential use cases for zram with memcg.

One interesting idea I have is to implement a swap controller per cgroup.
This would allow us to tailor the zram swap behavior to the specific needs of
different groups.

For example, Group A, which is sensitive to swap latency, could use zram swap
with a fast compression setting, even if it sacrifices some compression ratio.
This would prioritize quick access to swapped data, even if it takes up more space.

On the other hand, Group B, which can tolerate higher swap latency, could benefit
from a slower compression setting that achieves a higher compression ratio.
This would maximize memory efficiency at the cost of slightly slower data access.

This approach could provide a more nuanced and flexible way to manage swap usage
within different cgroups.