linux-kernel - Re: [PATCH -next v6] mm: vmscan: retry folios written back while isolated for traditional LRU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOUHufbjCrhLM6dw4JH7QmKuJWjPHDWdeDxywbiH4m=wkJgDzg@mail.gmail.com>
Date: Tue, 24 Dec 2024 12:33:23 -0700
From: Yu Zhao <yuzhao@...gle.com>
To: Chen Ridong <chenridong@...weicloud.com>
Cc: akpm@...ux-foundation.org, mhocko@...e.com, hannes@...xchg.org, 
	yosryahmed@...gle.com, david@...hat.com, willy@...radead.org, 
	ryan.roberts@....com, baohua@...nel.org, 21cnbao@...il.com, 
	wangkefeng.wang@...wei.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	wangweiyang2@...wei.com, xieym_ict@...mail.com
Subject: Re: [PATCH -next v6] mm: vmscan: retry folios written back while
 isolated for traditional LRU

On Tue, Dec 24, 2024 at 12:28 PM Yu Zhao <yuzhao@...gle.com> wrote:
>
> On Mon, Dec 23, 2024 at 11:45 PM Chen Ridong <chenridong@...weicloud.com> wrote:
> >
> >
> >
> > On 2024/12/24 12:19, Yu Zhao wrote:
> > > On Mon, Dec 23, 2024 at 1:30 AM Chen Ridong <chenridong@...weicloud.com> wrote:
> > >>
> > >> From: Chen Ridong <chenridong@...wei.com>
> > >>
> > >> The page reclaim isolates a batch of folios from the tail of one of the
> > >> LRU lists and works on those folios one by one.  For a suitable
> > >> swap-backed folio, if the swap device is async, it queues that folio for
> > >> writeback.  After the page reclaim finishes an entire batch, it puts back
> > >> the folios it queued for writeback to the head of the original LRU list.
> > >>
> > >> In the meantime, the page writeback flushes the queued folios also by
> > >> batches.  Its batching logic is independent from that of the page reclaim.
> > >> For each of the folios it writes back, the page writeback calls
> > >> folio_rotate_reclaimable() which tries to rotate a folio to the tail.
> > >>
> > >> folio_rotate_reclaimable() only works for a folio after the page reclaim
> > >> has put it back.  If an async swap device is fast enough, the page
> > >> writeback can finish with that folio while the page reclaim is still
> > >> working on the rest of the batch containing it.  In this case, that folio
> > >> will remain at the head and the page reclaim will not retry it before
> > >> reaching there.
> > >
> > > For starters, copying & pasting others' commit messages as your own is
> > > plagiarism. You need to quote them.
> >
> > Hi, Yu, Thank you for reminding me, I did not mean any plagiarism. I am
> > a beginner, and I do not know much about that.
>
> I didn't think you were intentional but please also understand this is
> a very basic rule that applies everywhere, not just this case.
>
> > I wrote the message in my v1 and v2 to describe what issue I was fixing,
> > which is wordy. What you wrote is much clearer in the commit
> > 359a5e1416ca, so I pasted it. I am sorry. Should resend a new patch to
> > modify the message?
> >
> > I have sent all patch versions to you, and I don't know whether you have
> > noticed. How I wish you could point it out at the first I pasted it, so
> > I wouldn't have made this mistake again and again.
>
> Here is an example of how I quoted from another commit message:
> https://lore.kernel.org/20241107202033.2721681-5-yuzhao@google.com/
>
> > >> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
> > >> while isolated") only fixed the issue for mglru. However, this issue
> > >> also exists in the traditional active/inactive LRU.
> > >
> > > You need to prove it with some numbers.
> >
> > Do you mean I should prove it with some info in my message? Actually, I
> > encountered this in the traditional active/inactive LRU, I did not know
> > you had fixed for mglru until Barry told me. I offered how to reproduce
> > this with Link.
>
> Yes, the first link does describe how to repro, but you also need to
> show the results after your patch, i.e., the improvements from the
> change being reviewed.
>
> > >> This issue will be
> > >> worse if THP is split, which makes the list longer and needs longer time
> > >> to finish a batch of folios reclaim.
> > >>
> > >> This issue should be fixed in the same way for the traditional LRU.
> > >> Therefore, the common logic was extracted to the 'find_folios_written_back'
> > >> function firstly, which is then reused in the 'shrink_inactive_list'
> > >> function. Finally, retry reclaiming those folios that may have missed the
> > >> rotation for traditional LRU.
> > >>
> > >> Link: https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/
> > >> Link: https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/
> > >> Signed-off-by: Chen Ridong <chenridong@...wei.com>
> > >> Reviewed-by: Barry Song <baohua@...nel.org>
> > >> ---
> > >>
> > >> v5->v6:
> > >>  - fix compile error(implicit declaration of function 'lru_gen_distance')
> > >>    when CONFIG_LRU_GEN is disable.
> > >
> > > Did you build-test it this time? I don't think LRU_REFS_FLAGS is
> > > defined when CONFIG_LRU_GEN=y.
> > >
> >
> > Yes, I tested. I didn't test when CONFIG_LRU_GEN=n in patch v5.
> > I tested with CONFIG_LRU_GEN=n and CONFIG_LRU_GEN=y in patch v6. I am
> > using the next.
>
> I see. I would base my patches on mm-unstable instead -next, i.e.,
> git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable

> > >>  - rename 'is_retried' to is_retrying suggested by Barry Song.

Also there is a subtle difference here: `skip_retry` implies "skip
retry after it has retried N times", and currently N=1 for MGLRU. The
change you made implies "only retry once", i.e., you ruled out the
later possibility for N=0 or 2, etc.