linux-kernel - Re: [External] Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF8kJuOR+POJx4wT7jbZkoiwGjrjVH4uuJ5GgoWBOf20JUf9qw@mail.gmail.com>
Date:   Sun, 19 Nov 2023 00:23:49 -0800
From:   Chris Li <chrisl@...nel.org>
To:     Zhongkun He <hezhongkun.hzk@...edance.com>
Cc:     Yosry Ahmed <yosryahmed@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Nhat Pham <nphamcs@...il.com>,
        Seth Jennings <sjenning@...hat.com>,
        Dan Streetman <ddstreet@...e.org>,
        Vitaly Wool <vitaly.wool@...sulko.com>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>, Ying <ying.huang@...el.com>
Subject: Re: [External] Re: [PATCH] mm:zswap: fix zswap entry reclamation
 failure in two scenarios

Hi Zhongkun,

On Fri, Nov 17, 2023 at 5:46 PM Zhongkun He
<hezhongkun.hzk@...edance.com> wrote:
> > Can you help me understand how much memory you can free from this
> > patch? For example, are we talking about a few pages or a few GB?
> >
> > Where does the freed memory come from?
> > If the memory comes from zswap entry struct. Due to the slab allocator
> > fragmentation. It would take a lot of zswap entries to have meaningful
> > memory reclaimed from the slab allocator.
> >
> > If the memory comes from the swap cached pages, that would be much
> > more meaningful. But that is not what this patch is doing, right?
> >
> > Chris
>
> It's my bad for putting two cases together. The memory released in both
> cases comes from zswap entry struct and zswap compressed page.

Thanks for the clarification. Keep in mind that memory freeing from
and zswap entry and zpool does not directly translate into page free.
If the page has other none freed zswap entry or zsmalloc usage, those
pages will not be free to the system. That is the fragmentation cost I
was talking about.

With this consideration, do you know many extra pages it can release
back to the system by this patch in your usage case? If the difference
is very small, it might not be worth the extra complexity to release
those.

> The original intention of this patch is to solve the problem that
> shrink_work() fails to reclaim memory in two situations.
>
> For case (1),  the zswap_writeback_entry() will failed for the
> __read_swap_cache_async return NULL because the swap has been
> freed but cached in swap_slots_cache, so the memory come from
> the zswap entry struct and compressed page.
In those cases, if we drop the swap_slots_cache, it will also free
those zswap entries and compressed pages (zpool), right?

> Count = SWAP_BATCH * ncpu.

That is the upper limit. Not all CPUs have swap batches fully loaded.

> Solution: move the zswap_invalidate() out of batches, free it once the swap
> count equal to 0.
Per previous discussion, this will have an impact on the
swap_slot_cache behavior.
We need some data points for cost benefit analysis.

> For case (2),  the zswap_writeback_entry() will failed for !page_was_allocated
> because zswap_load will have two copies of the same page in memory
>   (compressed and uncompressed) after faulting in a page from zswap when
> zswap_exclusive_loads disabled. The amount of memory is greater but depends
> on the usage.

That is basically disable the future swap out page IO write
optimization that skip the write if the page hasn't changed. If the
system are low on memory, that is justifiable. Again, it seems we can
have a pass to drop the compressed memory if the swap count is zero
(and mark page dirty).

>
> Why do we need  to release them?
> Consider this scenario,there is a lot of data cached in memory and zswap,
> hit the limit，and shrink_worker will fail. The new coming data will be written

Yes, the shrink_worker will need to allocate a page to store
uncompressed data for write back.

> directly to swap due to zswap_store failure. Should we free the last one
> to store the latest one in zswap.

The "last one" you mean the most recent zswap entry written into zswap?
Well, you need to allocate a page to write it out, that is an async process.
Shrink the zpool now is kind of too late already.

> According to the previous discussion, the writeback is inevitable.
> So I want to make zswap_exclusive_loads_enabled the default behavior
> or make it the only way to do zswap loads. It only makes sense when

We need some data point for how often we swap it out to zswap again,
where the zswap out write can be saved by using the existing compressed data.

It is totally possible this page IO write out optimization is not
worthwhile for zswap.
We need some data to support that claim.

> the page is read and no longer dirty. If the page is read frequently, it
> should stay in cache rather than zswap. The benefit of doing this is
> very small, i.e. two copies of the same page in memory.

If the benefit of doing this is very small, that seems to be the
argument against this patch?
Again we need some data points for cost and benefit analysis.

Chris