linux-kernel - Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag in zswap_writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJD7tkb2oda=4f0s8w8xn+t_TM1b2Q_otbb86VPQ9R1m2uqDTA@mail.gmail.com>
Date: Mon, 8 Jan 2024 15:12:45 -0800
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Nhat Pham <nphamcs@...il.com>
Cc: Zhongkun He <hezhongkun.hzk@...edance.com>, akpm@...ux-foundation.org, 
	hannes@...xchg.org, sjenning@...hat.com, ddstreet@...e.org, 
	vitaly.wool@...sulko.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	Chris Li <chrisl@...nel.org>, weijie.yang@...sung.com
Subject: Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag
 in zswap_writeback_entry

On Sun, Jan 7, 2024 at 1:59 PM Nhat Pham <nphamcs@...il.com> wrote:
>
> On Sun, Jan 7, 2024 at 1:29 PM Nhat Pham <nphamcs@...il.com> wrote:
> >
> > On Fri, Jan 5, 2024 at 6:10 AM Zhongkun He <hezhongkun.hzk@...edance.com> wrote:
> > >
> > > > > There is another option here, which is not to move the page to the
> > > > > tail of the inactive
> > > > > list after end_writeback and delete the following code in
> > > > > zswap_writeback_entry(),
> > > > > which did not work properly. But the pages will not be released first.
> > > > >
> > > > > /* move it to the tail of the inactive list after end_writeback */
> > > > > SetPageReclaim(page);
> >
> >
> > Ok, so I took a look at the patch that originally introduced this
> > piece of logic:
> >
> > https://github.com/torvalds/linux/commit/b349acc76b7f65400b85abd09a5379ddd6fa5a97
> >
> > Looks like it's not for the sake of correctness, but only as a
> > best-effort optimization (reducing page scanning). If it doesn't bring
> > any benefit (i.e due to the newly allocated page still on the cpu
> > batch), then we can consider removing it. After all, if you're right
> > and it's not really doing anything here - why bother. Perhaps we can
> > replace this with some other mechanism to avoid it being scanned for
> > reclaim.
>
> For instance, we can grab the local lock, look for the folio in the
> add batch and take the folio off it, then add it to the rotate batch
> instead? Not sure if this is doable within folio_rotate_reclaimable(),
> or you'll have to manually perform this yourself (and remove the
> PG_reclaim flag set here so that folio_end_writeback() doesn't try to
> handle it).
>
> There is still some overhead with this, but at least we don't have to
> *drain everything* (which looks like what's lru_add_drain() ->
> lru_add_drain_cpu() is doing). The latter sounds expensive and
> unnecessary, whereas this is just one element addition and one element
> removal - and if IIUC the size of the per-cpu add batch is capped at
> 15, so lookup + removal (if possible) shouldn't be too expensive?
>
> Just throwing ideas out there :)

Sorry for being late to the party. It seems to me that all of this
hassle can be avoided if lru_add_fn() did the right thing in this case
and added the folio to the tail of the lru directly. I am no expert in
how the page flags work here, but it seems like we can do something
like this in lru_add_fn():

if (folio_test_reclaim(folio))
    lruvec_add_folio_tail(lruvec, folio);
else
    lruvec_add_folio(lruvec, folio);

I think the main problem with this is that PG_reclaim is an alias to
PG_readahead, so readahead pages will also go to the tail of the lru,
which is probably not good.

A more intrusive alternative is to introduce a folio_lru_add_tail()
variant that always adds pages to the tail, and optionally call that
from __read_swap_cache_async() instead of folio_lru_add() based on a
new boolean argument. The zswap code can set that boolean argument
during writeback to make sure newly allocated folios are always added
to the tail of the lru.