lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKEwX=Na3dg+KZwvtQi-Nj79Am-1tttDw50_qStkobmYGUC6NA@mail.gmail.com>
Date: Tue, 9 Jan 2024 17:32:37 -0800
From: Nhat Pham <nphamcs@...il.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Zhongkun He <hezhongkun.hzk@...edance.com>, akpm@...ux-foundation.org, 
	hannes@...xchg.org, sjenning@...hat.com, ddstreet@...e.org, 
	vitaly.wool@...sulko.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	Chris Li <chrisl@...nel.org>
Subject: Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag
 in zswap_writeback_entry

On Tue, Jan 9, 2024 at 8:30 AM Yosry Ahmed <yosryahmed@...gle.com> wrote:
>
> On Mon, Jan 8, 2024 at 7:13 PM Zhongkun He <hezhongkun.hzk@...edance.com> wrote:
> >
> > Hi Yosry, glad to hear from you and happy new year!
> >
> > > Sorry for being late to the party. It seems to me that all of this
> > > hassle can be avoided if lru_add_fn() did the right thing in this case
> > > and added the folio to the tail of the lru directly. I am no expert in
> > > how the page flags work here, but it seems like we can do something
> > > like this in lru_add_fn():
> > >
> > > if (folio_test_reclaim(folio))
> > >     lruvec_add_folio_tail(lruvec, folio);
> > > else
> > >     lruvec_add_folio(lruvec, folio);
> > >
> > > I think the main problem with this is that PG_reclaim is an alias to
> > > PG_readahead, so readahead pages will also go to the tail of the lru,
> > > which is probably not good.

This sounds dangerous. This is going to introduce a rather large
unexpected side effect - we're changing the readahead behavior in a
seemingly small zswap optimization. In fact, I'd argue that if we do
this, the readahead behavior change will be the "main effect", and the
zswap-side change would be a "happy consequence". We should run a lot
of benchmarking and document the change extensively if we pursue this
route.

> >
> > Agree with you, I will try it.
>
> +Matthew Wilcox
>
> I think we need to figure out if it's okay to do this first, because
> it will affect pages with PG_readahead as well.
>
> >
> > >
> > > A more intrusive alternative is to introduce a folio_lru_add_tail()
> > > variant that always adds pages to the tail, and optionally call that
> > > from __read_swap_cache_async() instead of folio_lru_add() based on a
> > > new boolean argument. The zswap code can set that boolean argument
> > > during writeback to make sure newly allocated folios are always added
> > > to the tail of the lru.

Unless some page flag/readahead expert can confirm that the first
option is safe, my vote is on this option. I mean, it's fairly minimal
codewise, no? Just a bunch of plumbing. We can also keep the other
call sites intact if we just rename the old versions - something along
the line of:

__read_swap_cache_async_head(..., bool add_to_lru_head)
{
..
if (add_to_lru_head)
  folio_add_lru(folio)
else
  folio_add_lru_tail(folio);
}

__read_swap_cache_async(...)
{
   return __read_swap_cache_async_tail(..., true);
}

A bit boilerplate? Sure. But this seems safer, and I doubt it's *that*
much more work.

> >
> > I have the same idea and also find it intrusive. I think the first solution
> > is very good and I will try it. If it works, I will send the next version.
>
> One way to avoid introducing folio_lru_add_tail() and blumping a
> boolean from zswap is to have a per-task context (similar to
> memalloc_nofs_save()), that makes folio_add_lru() automatically add
> folios to the tail of the LRU. I am not sure if this is an acceptable
> approach though in terms of per-task flags and such.

This seems a bit hacky and obscure, but maybe it could work.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ