[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4yA5graSSE3cBf_RB=cGc3hLpcB-3pR9ymVfzKx_dg3Zg@mail.gmail.com>
Date: Fri, 29 Nov 2024 12:08:48 +1300
From: Barry Song <21cnbao@...il.com>
To: chenridong <chenridong@...wei.com>
Cc: Matthew Wilcox <willy@...radead.org>, Chris Li <chrisl@...nel.org>,
Chen Ridong <chenridong@...weicloud.com>, akpm@...ux-foundation.org, mhocko@...e.com,
hannes@...xchg.org, yosryahmed@...gle.com, yuzhao@...gle.com,
david@...hat.com, ryan.roberts@....com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, wangweiyang2@...wei.com, xieym_ict@...mail.com
Subject: Re: [RFC PATCH v2 1/1] mm/vmscan: move the written-back folios to the
tail of LRU after shrinking
On Mon, Nov 25, 2024 at 2:19 PM chenridong <chenridong@...wei.com> wrote:
>
>
>
> On 2024/11/18 12:21, Matthew Wilcox wrote:
> > On Mon, Nov 18, 2024 at 05:14:14PM +1300, Barry Song wrote:
> >> On Mon, Nov 18, 2024 at 5:03 PM Matthew Wilcox <willy@...radead.org> wrote:
> >>>
> >>> On Sat, Nov 16, 2024 at 09:16:58AM +0000, Chen Ridong wrote:
> >>>> 2. In shrink_page_list function, if folioN is THP(2M), it may be splited
> >>>> and added to swap cache folio by folio. After adding to swap cache,
> >>>> it will submit io to writeback folio to swap, which is asynchronous.
> >>>> When shrink_page_list is finished, the isolated folios list will be
> >>>> moved back to the head of inactive lru. The inactive lru may just look
> >>>> like this, with 512 filioes have been move to the head of inactive lru.
> >>>
> >>> I was hoping that we'd be able to stop splitting the folio when adding
> >>> to the swap cache. Ideally. we'd add the whole 2MB and write it back
> >>> as a single unit.
> >>
> >> This is already the case: adding to the swapcache doesn’t require splitting
> >> THPs, but failing to allocate 2MB of contiguous swap slots will.
> >
> > Agreed we need to understand why this is happening. As I've said a few
> > times now, we need to stop requiring contiguity. Real filesystems don't
> > need the contiguity (they become less efficient, but they can scatter a
> > single 2MB folio to multiple places).
> >
> > Maybe Chris has a solution to this in the works?
> >
>
> Hi, Chris, do you have a better idea to solve this issue?
Not Chris. As I read the code again, we have already the below code to fixup
the issue "missed folio_rotate_reclaimable()" in evict_folios():
/* retry folios that may have missed
folio_rotate_reclaimable() */
list_move(&folio->lru, &clean);
It doesn't work for you?
commit 359a5e1416caaf9ce28396a65ed3e386cc5de663
Author: Yu Zhao <yuzhao@...gle.com>
Date: Tue Nov 15 18:38:07 2022 -0700
mm: multi-gen LRU: retry folios written back while isolated
The page reclaim isolates a batch of folios from the tail of one of the
LRU lists and works on those folios one by one. For a suitable
swap-backed folio, if the swap device is async, it queues that folio for
writeback. After the page reclaim finishes an entire batch, it puts back
the folios it queued for writeback to the head of the original LRU list.
In the meantime, the page writeback flushes the queued folios also by
batches. Its batching logic is independent from that of the page reclaim.
For each of the folios it writes back, the page writeback calls
folio_rotate_reclaimable() which tries to rotate a folio to the tail.
folio_rotate_reclaimable() only works for a folio after the page reclaim
has put it back. If an async swap device is fast enough, the page
writeback can finish with that folio while the page reclaim is still
working on the rest of the batch containing it. In this case, that folio
will remain at the head and the page reclaim will not retry it before
reaching there.
This patch adds a retry to evict_folios(). After evict_folios() has
finished an entire batch and before it puts back folios it cannot free
immediately, it retries those that may have missed the rotation.
Before this patch, ~60% of folios swapped to an Intel Optane missed
folio_rotate_reclaimable(). After this patch, ~99% of missed folios were
reclaimed upon retry.
This problem affects relatively slow async swap devices like Samsung 980
Pro much less and does not affect sync swap devices like zram or zswap at
all.
>
> Best regards,
> Ridong
Thanks
Barry
Powered by blists - more mailing lists