[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86516155-f2d9-4e8d-9d27-bdcb59e2d129@redhat.com>
Date: Thu, 24 Jul 2025 10:59:18 +0200
From: David Hildenbrand <david@...hat.com>
To: Huan Yang <link@...o.com>, Andrew Morton <akpm@...ux-foundation.org>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Rik van Riel
<riel@...riel.com>, "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Harry Yoo <harry.yoo@...cle.com>,
Xu Xin <xu.xin16@....com.cn>, Chengming Zhou <chengming.zhou@...ux.dev>,
Mike Rapoport <rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>,
Michal Hocko <mhocko@...e.com>, Zi Yan <ziy@...dia.com>,
Matthew Brost <matthew.brost@...el.com>,
Joshua Hahn <joshua.hahnjy@...il.com>, Rakie Kim <rakie.kim@...com>,
Byungchul Park <byungchul@...com>, Gregory Price <gourry@...rry.net>,
Ying Huang <ying.huang@...ux.alibaba.com>,
Alistair Popple <apopple@...dia.com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Christian Brauner <brauner@...nel.org>, Usama Arif <usamaarif642@...il.com>,
Yu Zhao <yuzhao@...gle.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/9] introduce PGTY_mgt_entry page_type
On 24.07.25 10:44, Huan Yang wrote:
> Summary
> ==
> This patchset reuses page_type to store migrate entry count during the
> period from migrate entry setup to removal, enabling accelerated VMA
> traversal when removing migrate entries, following a similar principle to
> early termination when folio is unmapped in try_to_migrate.
I absolutely detest (ab)using page types for that, so no from my side
unless I am missing something important.
>
> In my self-constructed test scenario, the migration time can be reduced
How relevant is that in practice?
> from over 150+ms to around 30+ms, achieving nearly a 70% performance
> improvement. Additionally, the flame graph shows that the proportion of
> remove_migration_ptes can be reduced from 80%+ to 60%+.
>
> Notice: migrate entry specifically refers to migrate PTE entry, as large
> folio are not supported page type and 0 mapcount reuse.
>
> Principle
> ==
> When a page removes all PTEs in try_to_migrate and sets up a migrate PTE
> entry, we can determine whether the traversal of remaining VMAs can be
> terminated early by checking if mapcount is zero. This optimization
> helps improve performance during migration.
>
> However, when removing migrate PTE entries and setting up PTEs for the
> destination folio in remove_migration_ptes, there is no such information
> available to assist in deciding whether the traversal of remaining VMAs
> can be ended early. Therefore, it is necessary to traversal all VMAs
> associated with this folio.
Yes, we don't know how many migration entries are still pointing at the
page.
>
> In reality, when a folio is fully unmapped and before all migrate PTE
> entries are removed, the mapcount will always be zero. Since page_type
> and mapcount share a union, and referring to folio_mapcount, we can
> reuse page_type to record the number of migrate PTE entries of the
> current folio in the system as long as it's not a large folio. This
> reuse does not affect calls to folio_mapcount, which will always return
> zero.
> > Therefore, we can set the folio's page_type to PGTY_mgt_entry when
> try_to_migrate completes, the folio is already unmapped, and it's not a
> large folio. The remaining 24 bits can then be used to record the number
> of migrate PTE entries generated by try_to_migrate.
In the future the page type will no longer overlay the mapcount and,
consequently, be sticky.
>
> Then, in remove_migration_ptes, when the nr_mgt_entry count drops to
> zero, we can terminate the VMA traversal early.
>
> It's important to note that we need to initialize the folio's page_type
> to PGTY_mgt_entry and set the migrate entry count only while holding the
> rmap walk lock.This is because during the lock period, we can prevent
> new VMA fork (which would increase migrate entries) and VMA unmap
> (which would decrease migrate entries).
The more I read about PGTY_mgt_entry, the more I hate it.
>
> However, I doubt there is actually an additional critical section here, for
> example anon:
>
> Process Parent fork
> try_to_migrate
> anon_vma_clone
> write_lock
> avc_inster_tree tail
> ....
> folio_lock_anon_vma_read copy_pte_range
> vma_iter pte_lock
> .... pte_present copy
> ...
> pte_lock
> new forked pte clean
> ....
> remove_migration_ptes
> rmap_walk_anon_lock
>
> If my understanding is correct and such a critical section exists, it
> shouldn't cause any issues—newly added PTEs can still be properly
> removed and converted into migrate entries.
>
> But in this:
>
> Process Parent fork
> try_to_migrate
> anon_vma_clone
> write_lock
> avc_inster_tree
> ....
> folio_lock_anon_vma_read copy_pte_range
> vma_iter
> pte_lock
> migrate entry set
> .... pte_lock
> pte_nonpresent copy
> ....
> ....
> remove_migration_ptes
> rmap_walk_anon_lock
Just a note: migration entries also apply to non-anon folios.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists