lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86516155-f2d9-4e8d-9d27-bdcb59e2d129@redhat.com>
Date: Thu, 24 Jul 2025 10:59:18 +0200
From: David Hildenbrand <david@...hat.com>
To: Huan Yang <link@...o.com>, Andrew Morton <akpm@...ux-foundation.org>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Rik van Riel
 <riel@...riel.com>, "Liam R. Howlett" <Liam.Howlett@...cle.com>,
 Vlastimil Babka <vbabka@...e.cz>, Harry Yoo <harry.yoo@...cle.com>,
 Xu Xin <xu.xin16@....com.cn>, Chengming Zhou <chengming.zhou@...ux.dev>,
 Mike Rapoport <rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>,
 Michal Hocko <mhocko@...e.com>, Zi Yan <ziy@...dia.com>,
 Matthew Brost <matthew.brost@...el.com>,
 Joshua Hahn <joshua.hahnjy@...il.com>, Rakie Kim <rakie.kim@...com>,
 Byungchul Park <byungchul@...com>, Gregory Price <gourry@...rry.net>,
 Ying Huang <ying.huang@...ux.alibaba.com>,
 Alistair Popple <apopple@...dia.com>,
 "Matthew Wilcox (Oracle)" <willy@...radead.org>,
 Christian Brauner <brauner@...nel.org>, Usama Arif <usamaarif642@...il.com>,
 Yu Zhao <yuzhao@...gle.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/9] introduce PGTY_mgt_entry page_type

On 24.07.25 10:44, Huan Yang wrote:
> Summary
> ==
> This patchset reuses page_type to store migrate entry count during the
> period from migrate entry setup to removal, enabling accelerated VMA
> traversal when removing migrate entries, following a similar principle to
> early termination when folio is unmapped in try_to_migrate.

I absolutely detest (ab)using page types for that, so no from my side 
unless I am missing something important.

> 
> In my self-constructed test scenario, the migration time can be reduced

How relevant is that in practice?

> from over 150+ms to around 30+ms, achieving nearly a 70% performance
> improvement. Additionally, the flame graph shows that the proportion of
> remove_migration_ptes can be reduced from 80%+ to 60%+.
> 
> Notice: migrate entry specifically refers to migrate PTE entry, as large
> folio are not supported page type and 0 mapcount reuse.
> 
> Principle
> ==
> When a page removes all PTEs in try_to_migrate and sets up a migrate PTE
> entry, we can determine whether the traversal of remaining VMAs can be
> terminated early by checking if mapcount is zero. This optimization
> helps improve performance during migration.
> 
> However, when removing migrate PTE entries and setting up PTEs for the
> destination folio in remove_migration_ptes, there is no such information
> available to assist in deciding whether the traversal of remaining VMAs
> can be ended early. Therefore, it is necessary to traversal all VMAs
> associated with this folio.

Yes, we don't know how many migration entries are still pointing at the 
page.

> 
> In reality, when a folio is fully unmapped and before all migrate PTE
> entries are removed, the mapcount will always be zero. Since page_type
> and mapcount share a union, and referring to folio_mapcount, we can
> reuse page_type to record the number of migrate PTE entries of the
> current folio in the system as long as it's not a large folio. This
> reuse does not affect calls to folio_mapcount, which will always return
> zero.
 > > Therefore, we can set the folio's page_type to PGTY_mgt_entry when
> try_to_migrate completes, the folio is already unmapped, and it's not a
> large folio. The remaining 24 bits can then be used to record the number
> of migrate PTE entries generated by try_to_migrate.

In the future the page type will no longer overlay the mapcount and, 
consequently, be sticky.

> 
> Then, in remove_migration_ptes, when the nr_mgt_entry count drops to
> zero, we can terminate the VMA traversal early.
> 
> It's important to note that we need to initialize the folio's page_type
> to PGTY_mgt_entry and set the migrate entry count only while holding the
> rmap walk lock.This is because during the lock period, we can prevent
> new VMA fork (which would increase migrate entries) and VMA unmap
> (which would decrease migrate entries).

The more I read about PGTY_mgt_entry, the more I hate it.

> 
> However, I doubt there is actually an additional critical section here, for
> example anon:
> 
> Process Parent                          fork
> try_to_migrate
>                                          anon_vma_clone
>                                              write_lock
>                                                  avc_inster_tree tail
>                                          ....
>      folio_lock_anon_vma_read             copy_pte_range
>          vma_iter                            pte_lock
>                  ....                           pte_present copy
>                                              ...
>                  pte_lock
>                      new forked pte clean
> ....
> remove_migration_ptes
>      rmap_walk_anon_lock
> 
> If my understanding is correct and such a critical section exists, it
> shouldn't cause any issues—newly added PTEs can still be properly
> removed and converted into migrate entries.
> 
> But in this:
> 
> Process Parent                          fork
> try_to_migrate
>                                          anon_vma_clone
>                                              write_lock
>                                                  avc_inster_tree
>                                          ....
>      folio_lock_anon_vma_read             copy_pte_range
>          vma_iter
>                  pte_lock
>                      migrate entry set
>                  ....                        pte_lock
>                                                  pte_nonpresent copy
>                                              ....
> ....
> remove_migration_ptes
>      rmap_walk_anon_lock

Just a note: migration entries also apply to non-anon folios.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ