lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 8 Jan 2024 15:12:45 -0800
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Nhat Pham <nphamcs@...il.com>
Cc: Zhongkun He <hezhongkun.hzk@...edance.com>, akpm@...ux-foundation.org, 
	hannes@...xchg.org, sjenning@...hat.com, ddstreet@...e.org, 
	vitaly.wool@...sulko.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	Chris Li <chrisl@...nel.org>, weijie.yang@...sung.com
Subject: Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag
 in zswap_writeback_entry

On Sun, Jan 7, 2024 at 1:59 PM Nhat Pham <nphamcs@...il.com> wrote:
>
> On Sun, Jan 7, 2024 at 1:29 PM Nhat Pham <nphamcs@...il.com> wrote:
> >
> > On Fri, Jan 5, 2024 at 6:10 AM Zhongkun He <hezhongkun.hzk@...edance.com> wrote:
> > >
> > > > > There is another option here, which is not to move the page to the
> > > > > tail of the inactive
> > > > > list after end_writeback and delete the following code in
> > > > > zswap_writeback_entry(),
> > > > > which did not work properly. But the pages will not be released first.
> > > > >
> > > > > /* move it to the tail of the inactive list after end_writeback */
> > > > > SetPageReclaim(page);
> >
> >
> > Ok, so I took a look at the patch that originally introduced this
> > piece of logic:
> >
> > https://github.com/torvalds/linux/commit/b349acc76b7f65400b85abd09a5379ddd6fa5a97
> >
> > Looks like it's not for the sake of correctness, but only as a
> > best-effort optimization (reducing page scanning). If it doesn't bring
> > any benefit (i.e due to the newly allocated page still on the cpu
> > batch), then we can consider removing it. After all, if you're right
> > and it's not really doing anything here - why bother. Perhaps we can
> > replace this with some other mechanism to avoid it being scanned for
> > reclaim.
>
> For instance, we can grab the local lock, look for the folio in the
> add batch and take the folio off it, then add it to the rotate batch
> instead? Not sure if this is doable within folio_rotate_reclaimable(),
> or you'll have to manually perform this yourself (and remove the
> PG_reclaim flag set here so that folio_end_writeback() doesn't try to
> handle it).
>
> There is still some overhead with this, but at least we don't have to
> *drain everything* (which looks like what's lru_add_drain() ->
> lru_add_drain_cpu() is doing). The latter sounds expensive and
> unnecessary, whereas this is just one element addition and one element
> removal - and if IIUC the size of the per-cpu add batch is capped at
> 15, so lookup + removal (if possible) shouldn't be too expensive?
>
> Just throwing ideas out there :)

Sorry for being late to the party. It seems to me that all of this
hassle can be avoided if lru_add_fn() did the right thing in this case
and added the folio to the tail of the lru directly. I am no expert in
how the page flags work here, but it seems like we can do something
like this in lru_add_fn():

if (folio_test_reclaim(folio))
    lruvec_add_folio_tail(lruvec, folio);
else
    lruvec_add_folio(lruvec, folio);

I think the main problem with this is that PG_reclaim is an alias to
PG_readahead, so readahead pages will also go to the tail of the lru,
which is probably not good.

A more intrusive alternative is to introduce a folio_lru_add_tail()
variant that always adds pages to the tail, and optionally call that
from __read_swap_cache_async() instead of folio_lru_add() based on a
new boolean argument. The zswap code can set that boolean argument
during writeback to make sure newly allocated folios are always added
to the tail of the lru.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ