[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com>
Date: Fri, 29 Nov 2024 16:07:29 +1300
From: Barry Song <21cnbao@...il.com>
To: chenridong <chenridong@...wei.com>
Cc: Yu Zhao <yuzhao@...gle.com>, Matthew Wilcox <willy@...radead.org>, Chris Li <chrisl@...nel.org>,
Chen Ridong <chenridong@...weicloud.com>, akpm@...ux-foundation.org, mhocko@...e.com,
hannes@...xchg.org, yosryahmed@...gle.com, david@...hat.com,
ryan.roberts@....com, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
wangweiyang2@...wei.com, xieym_ict@...mail.com
Subject: Re: [RFC PATCH v2 1/1] mm/vmscan: move the written-back folios to the
tail of LRU after shrinking
On Fri, Nov 29, 2024 at 3:25 PM chenridong <chenridong@...wei.com> wrote:
>
>
>
> On 2024/11/29 7:08, Barry Song wrote:
> > On Mon, Nov 25, 2024 at 2:19 PM chenridong <chenridong@...wei.com> wrote:
> >>
> >>
> >>
> >> On 2024/11/18 12:21, Matthew Wilcox wrote:
> >>> On Mon, Nov 18, 2024 at 05:14:14PM +1300, Barry Song wrote:
> >>>> On Mon, Nov 18, 2024 at 5:03 PM Matthew Wilcox <willy@...radead.org> wrote:
> >>>>>
> >>>>> On Sat, Nov 16, 2024 at 09:16:58AM +0000, Chen Ridong wrote:
> >>>>>> 2. In shrink_page_list function, if folioN is THP(2M), it may be splited
> >>>>>> and added to swap cache folio by folio. After adding to swap cache,
> >>>>>> it will submit io to writeback folio to swap, which is asynchronous.
> >>>>>> When shrink_page_list is finished, the isolated folios list will be
> >>>>>> moved back to the head of inactive lru. The inactive lru may just look
> >>>>>> like this, with 512 filioes have been move to the head of inactive lru.
> >>>>>
> >>>>> I was hoping that we'd be able to stop splitting the folio when adding
> >>>>> to the swap cache. Ideally. we'd add the whole 2MB and write it back
> >>>>> as a single unit.
> >>>>
> >>>> This is already the case: adding to the swapcache doesn’t require splitting
> >>>> THPs, but failing to allocate 2MB of contiguous swap slots will.
> >>>
> >>> Agreed we need to understand why this is happening. As I've said a few
> >>> times now, we need to stop requiring contiguity. Real filesystems don't
> >>> need the contiguity (they become less efficient, but they can scatter a
> >>> single 2MB folio to multiple places).
> >>>
> >>> Maybe Chris has a solution to this in the works?
> >>>
> >>
> >> Hi, Chris, do you have a better idea to solve this issue?
> >
> > Not Chris. As I read the code again, we have already the below code to fixup
> > the issue "missed folio_rotate_reclaimable()" in evict_folios():
> >
> > /* retry folios that may have missed
> > folio_rotate_reclaimable() */
> > list_move(&folio->lru, &clean);
> >
> > It doesn't work for you?
> >
> > commit 359a5e1416caaf9ce28396a65ed3e386cc5de663
> > Author: Yu Zhao <yuzhao@...gle.com>
> > Date: Tue Nov 15 18:38:07 2022 -0700
> > mm: multi-gen LRU: retry folios written back while isolated
> >
> > The page reclaim isolates a batch of folios from the tail of one of the
> > LRU lists and works on those folios one by one. For a suitable
> > swap-backed folio, if the swap device is async, it queues that folio for
> > writeback. After the page reclaim finishes an entire batch, it puts back
> > the folios it queued for writeback to the head of the original LRU list.
> >
> > In the meantime, the page writeback flushes the queued folios also by
> > batches. Its batching logic is independent from that of the page reclaim.
> > For each of the folios it writes back, the page writeback calls
> > folio_rotate_reclaimable() which tries to rotate a folio to the tail.
> >
> >
> > folio_rotate_reclaimable() only works for a folio after the page reclaim
> > has put it back. If an async swap device is fast enough, the page
> > writeback can finish with that folio while the page reclaim is still
> > working on the rest of the batch containing it. In this case, that folio
> > will remain at the head and the page reclaim will not retry it before
> > reaching there.
> >
> > This patch adds a retry to evict_folios(). After evict_folios() has
> > finished an entire batch and before it puts back folios it cannot free
> > immediately, it retries those that may have missed the rotation.
> > Before this patch, ~60% of folios swapped to an Intel Optane missed
> > folio_rotate_reclaimable(). After this patch, ~99% of missed folios were
> > reclaimed upon retry.
> >
> > This problem affects relatively slow async swap devices like Samsung 980
> > Pro much less and does not affect sync swap devices like zram or zswap at
> > all.
> >
> >>
> >> Best regards,
> >> Ridong
> >
> > Thanks
> > Barry
>
> Thank you for your reply, Barry.
> I found this issue with 5.10 version. I reproduced this issue with the
> next version, but the CONFIG_LRU_GEN_ENABLED kconfig is disabled. I
> tested again with CONFIG_LRU_GEN_ENABLED enabled, and this issue can be
> fixed.
>
> IIUC, the 359a5e1416caaf9ce28396a65ed3e386cc5de663 commit can only work
> when CONFIG_LRU_GEN_ENABLED is enabled, but this issue exists when
> CONFIG_LRU_GEN_ENABLED is disabled and it should be fixed.
>
> I read the code of commit 359a5e1416caaf9ce28396a65ed3e386cc5de663, it
> found folios that are missed to rotate in a more complicated way, but it
> makes it much clearer what is being done. Should I implement in Yu
> Zhao's way?
yes. this is completely the same thing.
since Yu only fixed in mglru and you are still using active/inactive,
the same fix should apply to active/inactive lru.
>
> Best regards,
> Ridong
thanks
barry
Powered by blists - more mailing lists