linux-kernel - Re: [PATCH v5 0/6] workload-specific and memory pressure-driven zswap writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKEwX=P343G80Bfbf1R+FfSxty763Bo3WCo_Pu0GOuZSJjnxRw@mail.gmail.com>
Date:   Fri, 17 Nov 2023 11:23:42 -0500
From:   Nhat Pham <nphamcs@...il.com>
To:     Chris Li <chrisl@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Domenico Cerasuolo <cerasuolodomenico@...il.com>,
        Yosry Ahmed <yosryahmed@...gle.com>,
        Seth Jennings <sjenning@...hat.com>,
        Dan Streetman <ddstreet@...e.org>,
        Vitaly Wool <vitaly.wool@...sulko.com>, mhocko@...nel.org,
        roman.gushchin@...ux.dev, Shakeel Butt <shakeelb@...gle.com>,
        muchun.song@...ux.dev, linux-mm <linux-mm@...ck.org>,
        kernel-team@...a.com, LKML <linux-kernel@...r.kernel.org>,
        cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
        linux-kselftest@...r.kernel.org, shuah@...nel.org
Subject: Re: [PATCH v5 0/6] workload-specific and memory pressure-driven zswap writeback

On Thu, Nov 16, 2023 at 4:57 PM Chris Li <chrisl@...nel.org> wrote:
>
> Hi Nhat,
>
> I want want to share the high level feedback we discussed here in the
> mailing list as well.
>
> It is my observation that each memcg LRU list can't compare the page
> time order with other memcg.
> It works great when the leaf level memcg hits the memory limit and you
> want to reclaim from that memcg.
> It works less well on the global memory pressure you need to reclaim
> from all memcg. You kind of have to
> scan each all child memcg to find out the best page to shrink from. It
> is less effective to get to the most desirable page quickly.
>
> This can benefit from a design similar to MGLRU. This idea is
> suggested by Yu Zhao, credit goes to him not me.
> In other words, the current patch is similar to the memcg page list
> pre MGLRU world. We can have a MRLRU
> like per memcg zswap shrink list.

I was gonna summarize the points myself :P But thanks for doing this.
It's your idea so you're more qualified to explain this anyway ;)

I absolutely agree that having a generation-aware cgroup-aware
NUMA-aware LRU is the future way to go. Currently, IIUC, the reclaim logic
selects cgroups in a round-robin-ish manner. It's "fair" in this perspective,
but I also think it's not ideal. As we have discussed, the current list_lru
infrastructure only take into account intra-cgroup relative recency, not
inter-cgroup relative recency. The recently proposed time-based zswap
reclaim mechanism will provide us with a source of information, but the
overhead of using this might be too high - and it's very zswap-specific.

Maybe after this, we should improve zswap reclaim (and perhaps all
list_lru users) by adding generations to list_lru then take generations
into account in the vmscan code. This patch series could be merged
as-is, and once we make list_lru generation-aware, zswap shrinker
will automagically be improved (along with all other list_lru/shrinker
users).

I don't know enough about the current design of MGLRU to comment
too much further, but let me know if this makes sense, and if you have
objections/other ideas.

And if you have other documentations for MGLRU than its code, could
you please let me know? I'm struggling to find more details about this.


>
>
> Chris
>
> On Wed, Nov 8, 2023 at 6:10 PM Chris Li <chrisl@...nel.org> wrote:
> >
> > On Wed, Nov 8, 2023 at 4:28 PM Nhat Pham <nphamcs@...il.com> wrote:
> > >
> > > Hmm my guess is that I probably sent this out based on an outdated
> > > mm-unstable. There has since been a new zswap selftest merged
> > > to mm-unstable (written by no other than myself - oh the irony), so
> > > maybe it does not apply cleanly anymore with git am.
> >
> > $ git am -3 patches/zswap-pool-lru/0005
> > Applying: selftests: cgroup: update per-memcg zswap writeback selftest
> > Using index info to reconstruct a base tree...
> > M       tools/testing/selftests/cgroup/test_zswap.c
> > Falling back to patching base and 3-way merge...
> > Auto-merging tools/testing/selftests/cgroup/test_zswap.c
> > $ git am -3 patches/zswap-pool-lru/0006
> > Applying: zswap: shrinks zswap pool based on memory pressure
> > error: sha1 information is lacking or useless (mm/zswap.c).
> > error: could not build fake ancestor
> > Patch failed at 0001 zswap: shrinks zswap pool based on memory pressure
> > hint: Use 'git am --show-current-patch=diff' to see the failed patch
> > When you have resolved this problem, run "git am --continue".
> > If you prefer to skip this patch, run "git am --skip" instead.
> > To restore the original branch and stop patching, run "git am --abort".
> >
> > I was able to resolve the conflict on patch 6 by hand though. So I am good now.
> >
> > Thanks
> >
> > Chris