[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkb2oda=4f0s8w8xn+t_TM1b2Q_otbb86VPQ9R1m2uqDTA@mail.gmail.com>
Date: Mon, 8 Jan 2024 15:12:45 -0800
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Nhat Pham <nphamcs@...il.com>
Cc: Zhongkun He <hezhongkun.hzk@...edance.com>, akpm@...ux-foundation.org,
hannes@...xchg.org, sjenning@...hat.com, ddstreet@...e.org,
vitaly.wool@...sulko.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Chris Li <chrisl@...nel.org>, weijie.yang@...sung.com
Subject: Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag
in zswap_writeback_entry
On Sun, Jan 7, 2024 at 1:59 PM Nhat Pham <nphamcs@...il.com> wrote:
>
> On Sun, Jan 7, 2024 at 1:29 PM Nhat Pham <nphamcs@...il.com> wrote:
> >
> > On Fri, Jan 5, 2024 at 6:10 AM Zhongkun He <hezhongkun.hzk@...edance.com> wrote:
> > >
> > > > > There is another option here, which is not to move the page to the
> > > > > tail of the inactive
> > > > > list after end_writeback and delete the following code in
> > > > > zswap_writeback_entry(),
> > > > > which did not work properly. But the pages will not be released first.
> > > > >
> > > > > /* move it to the tail of the inactive list after end_writeback */
> > > > > SetPageReclaim(page);
> >
> >
> > Ok, so I took a look at the patch that originally introduced this
> > piece of logic:
> >
> > https://github.com/torvalds/linux/commit/b349acc76b7f65400b85abd09a5379ddd6fa5a97
> >
> > Looks like it's not for the sake of correctness, but only as a
> > best-effort optimization (reducing page scanning). If it doesn't bring
> > any benefit (i.e due to the newly allocated page still on the cpu
> > batch), then we can consider removing it. After all, if you're right
> > and it's not really doing anything here - why bother. Perhaps we can
> > replace this with some other mechanism to avoid it being scanned for
> > reclaim.
>
> For instance, we can grab the local lock, look for the folio in the
> add batch and take the folio off it, then add it to the rotate batch
> instead? Not sure if this is doable within folio_rotate_reclaimable(),
> or you'll have to manually perform this yourself (and remove the
> PG_reclaim flag set here so that folio_end_writeback() doesn't try to
> handle it).
>
> There is still some overhead with this, but at least we don't have to
> *drain everything* (which looks like what's lru_add_drain() ->
> lru_add_drain_cpu() is doing). The latter sounds expensive and
> unnecessary, whereas this is just one element addition and one element
> removal - and if IIUC the size of the per-cpu add batch is capped at
> 15, so lookup + removal (if possible) shouldn't be too expensive?
>
> Just throwing ideas out there :)
Sorry for being late to the party. It seems to me that all of this
hassle can be avoided if lru_add_fn() did the right thing in this case
and added the folio to the tail of the lru directly. I am no expert in
how the page flags work here, but it seems like we can do something
like this in lru_add_fn():
if (folio_test_reclaim(folio))
lruvec_add_folio_tail(lruvec, folio);
else
lruvec_add_folio(lruvec, folio);
I think the main problem with this is that PG_reclaim is an alias to
PG_readahead, so readahead pages will also go to the tail of the lru,
which is probably not good.
A more intrusive alternative is to introduce a folio_lru_add_tail()
variant that always adds pages to the tail, and optionally call that
from __read_swap_cache_async() instead of folio_lru_add() based on a
new boolean argument. The zswap code can set that boolean argument
during writeback to make sure newly allocated folios are always added
to the tail of the lru.
Powered by blists - more mailing lists