[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40cb7d64-0b90-4561-8e10-06a808a2766a@vivo.com>
Date: Thu, 24 Jul 2025 17:29:30 +0800
From: Huan Yang <link@...o.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>, Rik van Riel <riel@...riel.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka
<vbabka@...e.cz>, Harry Yoo <harry.yoo@...cle.com>,
Xu Xin <xu.xin16@....com.cn>, Chengming Zhou <chengming.zhou@...ux.dev>,
Mike Rapoport <rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>,
Michal Hocko <mhocko@...e.com>, Zi Yan <ziy@...dia.com>,
Matthew Brost <matthew.brost@...el.com>,
Joshua Hahn <joshua.hahnjy@...il.com>, Rakie Kim <rakie.kim@...com>,
Byungchul Park <byungchul@...com>, Gregory Price <gourry@...rry.net>,
Ying Huang <ying.huang@...ux.alibaba.com>,
Alistair Popple <apopple@...dia.com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Christian Brauner <brauner@...nel.org>, Usama Arif <usamaarif642@...il.com>,
Yu Zhao <yuzhao@...gle.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/9] introduce PGTY_mgt_entry page_type
在 2025/7/24 17:15, Lorenzo Stoakes 写道:
> NAK. This series is completely un-upstreamable in any form.
>
> David has responded to you already, but to underline.
>
> The lesson here is that you really ought to discuss things with people in
> the subsystem you are changing in advance of spending a lot of time doing
> work like this which you intend to upstream.
Yes, this is a very useful lesson.:)
In the future, when I have ideas in this area, I will bring them up for
discussion first, especially when
they involve folios or pages.
>
> On Thu, Jul 24, 2025 at 04:44:28PM +0800, Huan Yang wrote:
>> Summary
>> ==
>> This patchset reuses page_type to store migrate entry count during the
>> period from migrate entry setup to removal, enabling accelerated VMA
>> traversal when removing migrate entries, following a similar principle to
>> early termination when folio is unmapped in try_to_migrate.
>>
>> In my self-constructed test scenario, the migration time can be reduced
>> from over 150+ms to around 30+ms, achieving nearly a 70% performance
>> improvement. Additionally, the flame graph shows that the proportion of
>> remove_migration_ptes can be reduced from 80%+ to 60%+.
> This sounds completely contrived. I don't even know if you have a use case
> here.
The test case I provided does have an amplified effect, but the
optimization it demonstrates is real. It's just that when scaled up to
the system level, the effect becomes difficult to observe.
>
>> Notice: migrate entry specifically refers to migrate PTE entry, as large
>> folio are not supported page type and 0 mapcount reuse.
>>
>> Principle
>> ==
>> When a page removes all PTEs in try_to_migrate and sets up a migrate PTE
>> entry, we can determine whether the traversal of remaining VMAs can be
>> terminated early by checking if mapcount is zero. This optimization
>> helps improve performance during migration.
>>
>> However, when removing migrate PTE entries and setting up PTEs for the
>> destination folio in remove_migration_ptes, there is no such information
>> available to assist in deciding whether the traversal of remaining VMAs
>> can be ended early. Therefore, it is necessary to traversal all VMAs
>> associated with this folio.
>>
>> In reality, when a folio is fully unmapped and before all migrate PTE
>> entries are removed, the mapcount will always be zero. Since page_type
>> and mapcount share a union, and referring to folio_mapcount, we can
>> reuse page_type to record the number of migrate PTE entries of the
>> current folio in the system as long as it's not a large folio. This
>> reuse does not affect calls to folio_mapcount, which will always return
>> zero.
> OK so - if you ever find yourself thinking this way, please stop. We are in
> the midst of fundamentally changing how folios and pages work.
>
> There is absolutely ZERO room for reusing arbitrary fields in this way. Any
> series that attempts to do this will be rejected.
>
> Again, I must say - if you had raised this ahead of time we could have
> saved you some effort.
>
>> Therefore, we can set the folio's page_type to PGTY_mgt_entry when
>> try_to_migrate completes, the folio is already unmapped, and it's not a
>> large folio. The remaining 24 bits can then be used to record the number
>> of migrate PTE entries generated by try_to_migrate.
> I mean there's so much wrong here. The future is large folios. Making some
> fundamental change that relies on not-large folio is a mistake. 24
> bits... I mean no.
Thanks, I understand it.
>
>> Then, in remove_migration_ptes, when the nr_mgt_entry count drops to
>> zero, we can terminate the VMA traversal early.
>>
>> It's important to note that we need to initialize the folio's page_type
>> to PGTY_mgt_entry and set the migrate entry count only while holding the
>> rmap walk lock.This is because during the lock period, we can prevent
>> new VMA fork (which would increase migrate entries) and VMA unmap
>> (which would decrease migrate entries).
> No, no no. NO.
>
> You are not introducing new locking complexity for this.
>
> I could go on, but there's no point.
>
> This series is not upstreamable, NAK.
>
Powered by blists - more mailing lists