linux-kernel - Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag in zswap_writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKEwX=OsTQCJd12S3NajRMRy_s3q3yGFpS7S=_3-yOYK6+ezzA@mail.gmail.com>
Date: Thu, 11 Jan 2024 11:25:18 -0800
From: Nhat Pham <nphamcs@...il.com>
To: Zhongkun He <hezhongkun.hzk@...edance.com>
Cc: Yosry Ahmed <yosryahmed@...gle.com>, akpm@...ux-foundation.org, hannes@...xchg.org, 
	sjenning@...hat.com, ddstreet@...e.org, vitaly.wool@...sulko.com, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	Chris Li <chrisl@...nel.org>
Subject: Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag
 in zswap_writeback_entry

On Wed, Jan 10, 2024 at 7:49 PM Zhongkun He
<hezhongkun.hzk@...edance.com> wrote:
>
> >
> > This sounds dangerous. This is going to introduce a rather large
> > unexpected side effect - we're changing the readahead behavior in a
> > seemingly small zswap optimization. In fact, I'd argue that if we do
> > this, the readahead behavior change will be the "main effect", and the
> > zswap-side change would be a "happy consequence". We should run a lot
> > of benchmarking and document the change extensively if we pursue this
> > route.
> >
>
> I agree with the unexpected side effect,  and here I need
> to clarify the original intention of this patch.Please see the memory
> offloading steps below.
>
>
> memory      zswap(reclaim)          memory+swap (writeback)
> 1G                 0.5G                        1G(tmp memory) + 1G（swap）
>
> If the decompressed memory cannot be released in time,
> zswap's writeback has great side effects(mostly clod pages).
> On the one hand, the memory space has not been reduced,
> but has increased (from 0.5G->1G).
> At the same time, it is not put the pages to the tail of the lru.
> When the memory is insufficient, other pages will be squeezed out
> and released early.
> With this patch， we can put the tmp pages to the tail and reclaim it
> in time when the memory is insufficient or actively reclaimed.
> So I think this patch makes sense and hope it can be fixed with a
> suitable approaches.

Makes sense to me. IIUC, that's the original intention behind calling
SetPageReclaim() - unfortunately that doesn't work :) And IIRC, your
original attempt shows reduction in swap usage (albeit at the cost of
performance regression), which means we're onto something. I believe
that the folio_lru_add_tail() approach will work :)

Please include a version of the clarification paragraph above in your
later version to explain the goal of the optimization, along with
suitable benchmark numbers to show the effect (such as minimal change
in performance, and reduction in some metrics). Maybe include the link
to the original patch that introduces SetPageReclaim() too, to show
the motivation behind all of this :) It'd be nice to have all the
contexts readily available, in case we need to revisit this in the
future (as was the case with the SetPageReclaim() here).

>
> >
> > Unless some page flag/readahead expert can confirm that the first
> > option is safe, my vote is on this option. I mean, it's fairly minimal
> > codewise, no? Just a bunch of plumbing. We can also keep the other
> > call sites intact if we just rename the old versions - something along
> > the line of:
> >
> > __read_swap_cache_async_head(..., bool add_to_lru_head)
> > {
> > ...
> > if (add_to_lru_head)
> >   folio_add_lru(folio)
> > else
> >   folio_add_lru_tail(folio);
> > }
> >
> > __read_swap_cache_async(...)
> > {
> >    return __read_swap_cache_async_tail(..., true);
> > }
> >
> > A bit boilerplate? Sure. But this seems safer, and I doubt it's *that*
> > much more work.
> >
>
> Yes， agree. I will try it again.

Look forward to seeing it! Thanks for your patience and for working on this.