linux-kernel - Re: [External] Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKEwX=MinEqt_gR+hvyYc1dgU68B1Auw8fypRo7SKFjFK7VHaA@mail.gmail.com>
Date:   Tue, 14 Nov 2023 11:30:35 -0500
From:   Nhat Pham <nphamcs@...il.com>
To:     贺中坤 <hezhongkun.hzk@...edance.com>
Cc:     akpm@...ux-foundation.org, hannes@...xchg.org,
        yosryahmed@...gle.com, sjenning@...hat.com, ddstreet@...e.org,
        vitaly.wool@...sulko.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH] mm:zswap: fix zswap entry reclamation
 failure in two scenarios

On Tue, Nov 14, 2023 at 12:21 AM 贺中坤 <hezhongkun.hzk@...edance.com> wrote:
>
> Thanks for your time, Nhat.
>
> >
> > These two cases should not count as "successful writeback" right?
> >
>
> This is true from the perspective of the writeback itself, but should it
> also be considered successful from the purpose of the writeback,
>  i.e. whether the compressed memory and zswap_entry can be reclaimed?
>
> > I'm slightly biased of course, since my zswap shrinker depends on this
> > as one of the potential signals for over-shrinking - but that aside, I think
> > that this constitutes a failed writeback (i.e should not increment writeback
> > counter, and the limit-based reclaim should try again etc.). If anything,
> > it will make it incredibly confusing for users.
>
> This patch will skip the writeback step，so the writeback counter will not
> be incremented. Currently MAX_RECLAIM_RETRIES is 14, shrink_worker
> will often fail if writeback fails.

Ah my bad, I should have been clearer.

I was looking at the zswap shrinker patch series (specifically the
cgroup-aware LRU patch), which moves the counter update out of
zswap_writeback_entry. If we apply that patch on top of that series, we will
screw up the counter. Should be easily fixable anyway though.

>
> >
> > For instance, we were trying to estimate the number of zswap store
> > fails by subtracting the writeback count from the overall pswpout, and
> > this could throw us off by inflating the writeback count, and deflating
> > the zswap store failure count as a result.
>
> As mentioned above, writeback counter will not be incremented.
>
> >
> > Regarding the second case specifically, I thought that was the point of
> > having zswap_exclusive_loads_enabled disabled - i.e still keeps a copy
> > around in the zswap pool even after a completed zswap_load? Based
> > on the Kconfig documentation:
> >
> > "This avoids having two copies of the same page in memory
> > (compressed and uncompressed) after faulting in a page from zswap.
> > The cost is that if the page was never dirtied and needs to be
> > swapped out again, it will be re-compressed."
> >
>
> Yes，i know the point，in the case of reading, there is no data update,
> so the next swapout does not need to be compressed again.
> Consider this scenario,there is a lot of data cached in memory and zswap,
> hit the limit，and shrink_worker will fail. The new coming data be written
> directly to swap due to zswap_store failure. Should we free the last one
> to store the latest one in zswap.

Ah I think I understand the point of the patch a bit better now.

Essentially, we're invalidating these entries, which does reclaim the
memory used for these compressed objects, but there is no IO involved.
Writeback-less shrinking, if you will.

This will still screw up one of the heuristics I'm using for the zswap
shrinker a bit, but that should be easily fixable with some plumbing.
Same goes for the writeback counter - but depending on the order in
which Andrew apply the patches, you might have to resolve the conflicts
there :)

Other than this objection, I think this optimization makes sense to me:

In the first case, we already freed the swap entry. Might as well also
dropped the zswap entry.

In the second case, we already have another copy in memory, so
dropping the compressed copy to make space for warmer objects
coming into zswap makes sense to me. Might be worth doing a
micro-benchmark to verify this intuition, but I agree that it's more
important to maintain the LRU ordering than any CPU saving from
skipping re-compression.

I would suggest that you should expand on this on the commit log
to make clearer the motivation behind this optimization, if you were
to re-submit this patch for some reason (for e.g to resolve the
aforementioned conflicts with the zswap shrinker series).

But otherwise, LGTM!

Feel free to add the following tag:
Reviewed-by: Nhat Pham <nphamcs@...il.com>