linux-kernel - Re: [External] Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACSyD1Omro1h2w6bSP0_gB21u=KwFDu+DHjN4+K82H5Pv7yMig@mail.gmail.com>
Date:   Mon, 20 Nov 2023 11:16:11 +0800
From:   Zhongkun He <hezhongkun.hzk@...edance.com>
To:     Chris Li <chrisl@...nel.org>
Cc:     Yosry Ahmed <yosryahmed@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Nhat Pham <nphamcs@...il.com>,
        Seth Jennings <sjenning@...hat.com>,
        Dan Streetman <ddstreet@...e.org>,
        Vitaly Wool <vitaly.wool@...sulko.com>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>, Ying <ying.huang@...el.com>
Subject: Re: [External] Re: [PATCH] mm:zswap: fix zswap entry reclamation
 failure in two scenarios

>
> Thanks for the clarification. Keep in mind that memory freeing from
> and zswap entry and zpool does not directly translate into page free.
> If the page has other none freed zswap entry or zsmalloc usage, those
> pages will not be free to the system. That is the fragmentation cost I
> was talking about.

Yes, it may need to be compacted.

>
> With this consideration, do you know many extra pages it can release
> back to the system by this patch in your usage case? If the difference
> is very small, it might not be worth the extra complexity to release
> those.
>

The original intention of this patch is to make shrink work properly，
not to release cache and related memory.

> > The original intention of this patch is to solve the problem that
> > shrink_work() fails to reclaim memory in two situations.
> >
> > For case (1),  the zswap_writeback_entry() will failed for the
> > __read_swap_cache_async return NULL because the swap has been
> > freed but cached in swap_slots_cache, so the memory come from
> > the zswap entry struct and compressed page.
> In those cases, if we drop the swap_slots_cache, it will also free
> those zswap entries and compressed pages (zpool), right?
>
> > Count = SWAP_BATCH * ncpu.
>
> That is the upper limit. Not all CPUs have swap batches fully loaded.

Yes.

>
> > Solution: move the zswap_invalidate() out of batches, free it once the swap
> > count equal to 0.
> Per previous discussion, this will have an impact on the
> swap_slot_cache behavior.
> We need some data points for cost benefit analysis.
>
> > For case (2),  the zswap_writeback_entry() will failed for !page_was_allocated
> > because zswap_load will have two copies of the same page in memory
> >   (compressed and uncompressed) after faulting in a page from zswap when
> > zswap_exclusive_loads disabled. The amount of memory is greater but depends
> > on the usage.
>
> That is basically disable the future swap out page IO write
> optimization that skip the write if the page hasn't changed. If the
> system are low on memory, that is justifiable. Again, it seems we can
> have a pass to drop the compressed memory if the swap count is zero
> (and mark page dirty).
>

OK.

> >
> > Why do we need  to release them?
> > Consider this scenario,there is a lot of data cached in memory and zswap,
> > hit the limit，and shrink_worker will fail. The new coming data will be written
>
> Yes, the shrink_worker will need to allocate a page to store
> uncompressed data for write back.
>
> > directly to swap due to zswap_store failure. Should we free the last one
> > to store the latest one in zswap.
>
> The "last one" you mean the most recent zswap entry written into zswap?
> Well, you need to allocate a page to write it out, that is an async process.
> Shrink the zpool now is kind of too late already.
>

The last zswap_entry in zswap_pool->lru.

> > According to the previous discussion, the writeback is inevitable.
> > So I want to make zswap_exclusive_loads_enabled the default behavior
> > or make it the only way to do zswap loads. It only makes sense when
>
> We need some data point for how often we swap it out to zswap again,
> where the zswap out write can be saved by using the existing compressed data.
>
> It is totally possible this page IO write out optimization is not
> worthwhile for zswap.
> We need some data to support that claim.
>

Got it. I will find it.

> > the page is read and no longer dirty. If the page is read frequently, it
> > should stay in cache rather than zswap. The benefit of doing this is
> > very small, i.e. two copies of the same page in memory.
>
> If the benefit of doing this is very small, that seems to be the
> argument against this patch?
> Again we need some data points for cost and benefit analysis.
>

Yes, it is the new idea to make zswap_exclusive_loads_enabled the
default behavior or make it the only way to do zswap loads.

> Chris