[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f21b75f9-0650-44c2-bf47-516390364a8b@lucifer.local>
Date: Thu, 24 Jul 2025 10:45:30 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Huan Yang <link@...o.com>
Cc: David Hildenbrand <david@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...riel.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Harry Yoo <harry.yoo@...cle.com>,
Xu Xin <xu.xin16@....com.cn>,
Chengming Zhou <chengming.zhou@...ux.dev>,
Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Zi Yan <ziy@...dia.com>, Matthew Brost <matthew.brost@...el.com>,
Joshua Hahn <joshua.hahnjy@...il.com>, Rakie Kim <rakie.kim@...com>,
Byungchul Park <byungchul@...com>, Gregory Price <gourry@...rry.net>,
Ying Huang <ying.huang@...ux.alibaba.com>,
Alistair Popple <apopple@...dia.com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Christian Brauner <brauner@...nel.org>,
Usama Arif <usamaarif642@...il.com>, Yu Zhao <yuzhao@...gle.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/9] introduce PGTY_mgt_entry page_type
On Thu, Jul 24, 2025 at 05:36:27PM +0800, Huan Yang wrote:
>
> 在 2025/7/24 17:32, David Hildenbrand 写道:
> > On 24.07.25 11:20, David Hildenbrand wrote:
> > > On 24.07.25 11:12, David Hildenbrand wrote:
> > > > On 24.07.25 11:09, Huan Yang wrote:
> > > > >
> > > > > 在 2025/7/24 16:59, David Hildenbrand 写道:
> > > > > > On 24.07.25 10:44, Huan Yang wrote:
> > > > > > > Summary
> > > > > > > ==
> > > > > > > This patchset reuses page_type to store migrate
> > > > > > > entry count during the
> > > > > > > period from migrate entry setup to removal, enabling accelerated VMA
> > > > > > > traversal when removing migrate entries, following a similar
> > > > > > > principle to
> > > > > > > early termination when folio is unmapped in try_to_migrate.
> > > > > >
> > > > > > I absolutely detest (ab)using page types for that, so no from my side
> > > > > > unless I am missing something important.
> > > > > >
> > > > > > >
> > > > > > > In my self-constructed test scenario, the migration
> > > > > > > time can be reduced
> > > > > >
> > > > > > How relevant is that in practice?
> > > > >
> > > > > IMO, any folio mapped < nr vma in mapping(anon_vma, addresss_space),
> > > > > will benefit from this.
> > > > >
> > > > > So, all pages that have been COW-ed by child processes can be skipped.
> > > >
> > > > For small anon folios, you could use the anon-exclusive marker
> > > > to derive
> > > > "there can only be a single mapping".
> > > >
> > > > It's stored alongside the migration entry.
> > > >
> > > > So once you restored that single migration entry, you can just stop the
> > > > walk.
> > >
> > > Essentially, something (untested) like this:
> > >
> > > diff --git a/mm/migrate.c b/mm/migrate.c
> > > index 425401b2d4e14..aa5bf96b1daee 100644
> > > --- a/mm/migrate.c
> > > +++ b/mm/migrate.c
> > > @@ -421,6 +421,15 @@ static bool remove_migration_pte(struct folio
> > > *folio,
> > > /* No need to invalidate - it was non-present
> > > before */
> > > update_mmu_cache(vma, pvmw.address, pvmw.pte);
> > > +
> > > + /*
> > > + * If the small anon folio is exclusive, here can be
> > > exactly one
> > > + * page mapping -- the one we just restored.
> > > + */
> > > + if (!folio_test_large(folio) && (rmap_flags &
> > > RMAP_EXCLUSIVE)) {
> > > + page_vma_mapped_walk_done(&pvmw);
> > > + break;
> > > + }
> > > }
> > > return true;
> >
> > Probably that won't really help I assume, because __folio_set_anon()
> > will move the new anon folio under vma->anon_vma, not
> > vma->anon_vma->root.
> >
> > So I assume you mean that we had a COW-shared folio now mapped only into
> > some VMAs (some mappings in other processes removed due to CoW or
> > similar).
> >
> > In that case aborting early can help.
> >
> > Not in all cases though, just imagine that the very last VMA we're
> > iterating maps the page. You have to iterate through all of them either
> > way ... no way around that, really.
>
> Indeed, whether we can exit the loop early depends on the position of the
> terminating VMA in the tree.
>
> I think a better approach would be to remove the fully COW-ed VMAs and their
> associated AVCs from the anon_vma's tree.
>
> I've been researching this aspect, but haven't made any progress yet.(I have
> some ideas, but the specific implementation is still challenging.)
>
Please leave this alone, I'm in the midst of trying to make fundamental changes
to the anon rmap logic and it's really very subtle and indeed challenging (as
you've seen).
Since I intend to change the whole mechanism around this, efforts to adjust the
existing behaviour are going to strictly conflict with that.
We are 'lazy' in actually properly accounting for fully CoW'd VMAs and so can
only know 'maybe' if it has, I mean as from above you've noticed.
The CoW hierarchy also makes life challenging, see vma_had_uncowed_parents() for
an example of the subtlty.
Having looked at anon rmap in detail, I have come to think the only sensible way
forward is something fairly bold.
Thanks!
Powered by blists - more mailing lists