[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFyAuvE-dksvHqlG@hyeyoo>
Date: Thu, 26 Jun 2025 08:05:30 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: Raghavendra K T <raghavendra.kt@....com>
Cc: AneeshKumar.KizhakeVeetil@....com, Hasan.Maruf@....com,
Michael.Day@....com, akpm@...ux-foundation.org, bharata@....com,
dave.hansen@...el.com, david@...hat.com, dongjoo.linux.dev@...il.com,
feng.tang@...el.com, gourry@...rry.net, hannes@...xchg.org,
honggyu.kim@...com, hughd@...gle.com, jhubbard@...dia.com,
jon.grimm@....com, k.shutemov@...il.com, kbusch@...a.com,
kmanaouil.dev@...il.com, leesuyeon0506@...il.com, leillc@...gle.com,
liam.howlett@...cle.com, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, mgorman@...hsingularity.net, mingo@...hat.com,
nadav.amit@...il.com, nphamcs@...il.com, peterz@...radead.org,
riel@...riel.com, rientjes@...gle.com, rppt@...nel.org,
santosh.shukla@....com, shivankg@....com, shy828301@...il.com,
sj@...nel.org, vbabka@...e.cz, weixugc@...gle.com, willy@...radead.org,
ying.huang@...ux.alibaba.com, ziy@...dia.com,
Jonathan.Cameron@...wei.com, dave@...olabs.net, yuanchu@...gle.com,
kinseyho@...gle.com, hdanton@...a.com
Subject: Re: [RFC PATCH V2 03/13] mm: Scan the mm and create a migration list
On Thu, Jun 26, 2025 at 07:07:12AM +0900, Harry Yoo wrote:
> On Tue, Jun 24, 2025 at 05:56:07AM +0000, Raghavendra K T wrote:
> > Since we already have the list of mm_struct in the system, add a module to
> > scan each mm that walks VMAs of each mm_struct and scan all the pages
> > associated with that.
> >
> > In the scan path: Check for the recently acccessed pages (folios) belonging
> > to slowtier nodes. Add all those folios to a list.
> >
> > Signed-off-by: Raghavendra K T <raghavendra.kt@....com>
> > ---
>
> Hi, just taking a quick look...
>
> > mm/kscand.c | 319 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 318 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/kscand.c b/mm/kscand.c
> > index d5b0d3041b0f..0edec1b7730d 100644
> > --- a/mm/kscand.c
> > +++ b/mm/kscand.c
> > @@ -42,6 +55,8 @@ static struct kmem_cache *kscand_slot_cache __read_mostly;
> > @@ -84,11 +122,275 @@ static void kscand_wait_work(void)
> > scan_sleep_jiffies);
> > }
> >
> > +static inline bool is_valid_folio(struct folio *folio)
> > +{
> > + if (!folio || folio_test_unevictable(folio) || !folio_mapped(folio) ||
> > + folio_is_zone_device(folio) || folio_maybe_mapped_shared(folio))
> > + return false;
> > +
> > + return true;
> > +}
>
> What makes it undesirable to migrate shared folios?
>
> > +static bool folio_idle_clear_pte_refs_one(struct folio *folio,
> > + struct vm_area_struct *vma,
> > + unsigned long addr,
> > + pte_t *ptep)
> > +{
> > + bool referenced = false;
> > + struct mm_struct *mm = vma->vm_mm;
> > + pmd_t *pmd = pmd_off(mm, addr);
> > +
> > + if (ptep) {
> > + if (ptep_clear_young_notify(vma, addr, ptep))
> > + referenced = true;
> > + } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > + if (!pmd_present(*pmd))
> > + WARN_ON_ONCE(1);
> > + if (pmdp_clear_young_notify(vma, addr, pmd))
> > + referenced = true;
> > + } else {
> > + WARN_ON_ONCE(1);
> > + }
>
> This does not look good.
>
> I think pmd entry handling should be handled in
> mm_walk_ops.pmd_entry callback?
>
> > +
> > + if (referenced) {
> > + folio_clear_idle(folio);
> > + folio_set_young(folio);
> > + }
> > +
> > + return true;
> > +}
> > +
> > +static void page_idle_clear_pte_refs(struct page *page, pte_t *pte, struct mm_walk *walk)
> > +{
> > + bool need_lock;
> > + struct folio *folio = page_folio(page);
> > + unsigned long address;
> > +
> > + if (!folio_mapped(folio) || !folio_raw_mapping(folio))
> > + return;
> > +
> > + need_lock = !folio_test_anon(folio) || folio_test_ksm(folio);
> > + if (need_lock && !folio_trylock(folio))
> > + return;
>
> Why acquire folio lock here?
>
> And I'm not even sure if it's safe to acquire it?
> The locking order is folio_lock -> pte_lock
>
> page walk should have already acquired pte_lock before calling
> ->pte_entry() callback.
Oops, it's trylock. Nevermind.
Needed more coffee in the morning :)
> > + address = vma_address(walk->vma, page_pgoff(folio, page), compound_nr(page));
> > + VM_BUG_ON_VMA(address == -EFAULT, walk->vma);
> > + folio_idle_clear_pte_refs_one(folio, walk->vma, address, pte);
> > +
> > + if (need_lock)
> > + folio_unlock(folio);
> > +}
> > +
> > +static const struct mm_walk_ops hot_vma_set_idle_ops = {
> > + .pte_entry = hot_vma_idle_pte_entry,
> > + .walk_lock = PGWALK_RDLOCK,
> > +};
> > +
> > +static void kscand_walk_page_vma(struct vm_area_struct *vma, struct kscand_scanctrl *scanctrl)
> > +{
> > + if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
> > + is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
> > + return;
> > + }
> > + if (!vma->vm_mm ||
> > + (vma->vm_file && (vma->vm_flags & (VM_READ|VM_WRITE)) == (VM_READ)))
> > + return;
>
> Why not walk writable file VMAs?
>
> > + if (!vma_is_accessible(vma))
> > + return;
> > +
> > + walk_page_vma(vma, &hot_vma_set_idle_ops, scanctrl);
> > +}
>
> --
> Cheers,
> Harry / Hyeonggon
--
Cheers,
Harry / Hyeonggon
Powered by blists - more mailing lists