linux-kernel - Re: [External] Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKEwX=O0eNmoFRsnRXpkY55UGHBOiGL2aQW6um8Kq5hgGH=c_A@mail.gmail.com>
Date:   Sat, 18 Nov 2023 13:43:52 -0500
From:   Nhat Pham <nphamcs@...il.com>
To:     Zhongkun He <hezhongkun.hzk@...edance.com>
Cc:     Chris Li <chrisl@...nel.org>, Yosry Ahmed <yosryahmed@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Seth Jennings <sjenning@...hat.com>,
        Dan Streetman <ddstreet@...e.org>,
        Vitaly Wool <vitaly.wool@...sulko.com>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>, Ying <ying.huang@...el.com>
Subject: Re: [External] Re: [PATCH] mm:zswap: fix zswap entry reclamation
 failure in two scenarios

On Fri, Nov 17, 2023 at 8:46 PM Zhongkun He
<hezhongkun.hzk@...edance.com> wrote:
>
> Hi Chris, thanks for your time.
>
> >
> > On Fri, Nov 17, 2023 at 1:56 AM Zhongkun He
> > <hezhongkun.hzk@...edance.com> wrote:
> > > Hi Chris, thanks for your feedback.  I have the same concerns,
> > > maybe we should just move the zswap_invalidate() out of batches,
> > > as Yosry mentioned above.
> >
> > As I replied in the previous email, I just want to understand the
> > other side effects of the change better.
> >
> > To me, this patching is actually freeing the memory that does not
> > require actual page IO write from zswap. Which means the memory is
> > from some kind of cache. It would be interesting if we can not
> > complicate the write back path further. Instead, we can drop those
> > memories from the different cache if needed. I assume those caches are
> > doing something useful in the common case. If not, we should have a
> > patch to remove these caches instead.  Not sure how big a mess it will
> > be to implement separate the write and drop caches.
> >
> > While you are here, I have some questions for you.
> >
> > Can you help me understand how much memory you can free from this
> > patch? For example, are we talking about a few pages or a few GB?
> >
> > Where does the freed memory come from?
> > If the memory comes from zswap entry struct. Due to the slab allocator
> > fragmentation. It would take a lot of zswap entries to have meaningful
> > memory reclaimed from the slab allocator.
> >
> > If the memory comes from the swap cached pages, that would be much
> > more meaningful. But that is not what this patch is doing, right?
> >
> > Chris
>
> It's my bad for putting two cases together. The memory released in both
> cases comes from zswap entry struct and zswap compressed page.
>
> The original intention of this patch is to solve the problem that
> shrink_work() fails to reclaim memory in two situations.
>
> For case (1),  the zswap_writeback_entry() will failed for the
> __read_swap_cache_async return NULL because the swap has been
> freed but cached in swap_slots_cache, so the memory come from
> the zswap entry struct and compressed page.
> Count = SWAP_BATCH * ncpu.
> Solution: move the zswap_invalidate() out of batches, free it once the swap
> count equal to 0.
>
> For case (2),  the zswap_writeback_entry() will failed for !page_was_allocated
> because zswap_load will have two copies of the same page in memory
>   (compressed and uncompressed) after faulting in a page from zswap when
> zswap_exclusive_loads disabled. The amount of memory is greater but depends
> on the usage.
>
> Why do we need  to release them?
> Consider this scenario,there is a lot of data cached in memory and zswap,
> hit the limit，and shrink_worker will fail. The new coming data will be written
> directly to swap due to zswap_store failure. Should we free the last one
> to store the latest one in zswap.

Shameless plug: zswap will much less likely hit the limit (global or
cgroup) with the shrinker enabled ;) It will proactively reclaim the
objects way ahead of the limit.

It comes with its own can of worms, of course - it's unlikely to work
for all workloads in its current form, but perhaps worth experimenting
with/improved upon?


>
> According to the previous discussion, the writeback is inevitable.
> So I want to make zswap_exclusive_loads_enabled the default behavior
> or make it the only way to do zswap loads. It only makes sense when
> the page is read and no longer dirty. If the page is read frequently, it
> should stay in cache rather than zswap. The benefit of doing this is
> very small, i.e. two copies of the same page in memory.
>
> Thanks again.