lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <sct6vvupd4cp6xt66nn6sfs7w3srpx6zcxxsn6rz5qo4tz3la6@btdqsbicmrto>
Date: Tue, 14 Jan 2025 10:12:03 +0200
From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, 
	"Matthew Wilcox (Oracle)" <willy@...radead.org>, Jens Axboe <axboe@...nel.dk>, 
	"Jason A. Donenfeld" <Jason@...c4.com>, Andi Shyti <andi.shyti@...ux.intel.com>, 
	Chengming Zhou <chengming.zhou@...ux.dev>, Christian Brauner <brauner@...nel.org>, 
	Christophe Leroy <christophe.leroy@...roup.eu>, Dan Carpenter <dan.carpenter@...aro.org>, 
	David Airlie <airlied@...il.com>, David Hildenbrand <david@...hat.com>, Hao Ge <gehao@...inos.cn>, 
	Jani Nikula <jani.nikula@...ux.intel.com>, Johannes Weiner <hannes@...xchg.org>, 
	Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>, Josef Bacik <josef@...icpanda.com>, 
	Masami Hiramatsu <mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, 
	Miklos Szeredi <miklos@...redi.hu>, Nhat Pham <nphamcs@...il.com>, 
	Oscar Salvador <osalvador@...e.de>, Ran Xiaokai <ran.xiaokai@....com.cn>, 
	Rodrigo Vivi <rodrigo.vivi@...el.com>, Simona Vetter <simona@...ll.ch>, 
	Steven Rostedt <rostedt@...dmis.org>, Tvrtko Ursulin <tursulin@...ulin.net>, 
	Vlastimil Babka <vbabka@...e.cz>, Yu Zhao <yuzhao@...gle.com>, intel-gfx@...ts.freedesktop.org, 
	dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	linux-mm@...ck.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH 4/8] mm/swap: Use PG_dropbehind instead of PG_reclaim

On Mon, Jan 13, 2025 at 08:17:20AM -0800, Yosry Ahmed wrote:
> On Mon, Jan 13, 2025 at 1:35 AM Kirill A. Shutemov
> <kirill.shutemov@...ux.intel.com> wrote:
> >
> > The recently introduced PG_dropbehind allows for freeing folios
> > immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> > to be involved to get the folio freed.
> >
> > Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> > lru_deactivate_file().
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> > ---
> >  mm/swap.c | 8 +-------
> >  1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > diff --git a/mm/swap.c b/mm/swap.c
> > index fc8281ef4241..4eb33b4804a8 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio)
> >         folio_clear_referenced(folio);
> >
> >         if (folio_test_writeback(folio) || folio_test_dirty(folio)) {
> > -               /*
> > -                * Setting the reclaim flag could race with
> > -                * folio_end_writeback() and confuse readahead.  But the
> > -                * race window is _really_ small and  it's not a critical
> > -                * problem.
> > -                */
> >                 lruvec_add_folio(lruvec, folio);
> > -               folio_set_reclaim(folio);
> > +               folio_set_dropbehind(folio);
> >         } else {
> >                 /*
> >                  * The folio's writeback ended while it was in the batch.
> 
> Now there's a difference in behavior here depending on whether or not
> the folio is under writeback (or will be written back soon). If it is,
> we set PG_dropbehind to get it freed right after, but if writeback has
> already ended we put it on the tail of the LRU to be freed later.
> 
> It's a bit counterintuitive to me that folios with pending writeback
> get freed faster than folios that completed their writeback already.
> Am I missing something?

Yeah, it is strange.

I think we can drop the writeback/dirty check. Set PG_dropbehind and put
the page on the tail of LRU unconditionally. The check was required to
avoid confusion with PG_readahead.

Comment above the function is not valid anymore.

But the folio that is still dirty under writeback will be freed faster as
we get rid of the folio just after writeback is done while clean page can
dangle on LRU for a while.

I don't think we have any convenient place to free clean dropbehind page
other than shrink_folio_list(). Or do we?

Looking at shrink_folio_list(), I think we need to bypass page demotion
for PG_dropbehind pages.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ