[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5a01f170-ce04-4f2c-b692-56469ebeb72c@nvidia.com>
Date: Sat, 5 Jul 2025 09:56:36 +1000
From: Balbir Singh <balbirs@...dia.com>
To: Zi Yan <ziy@...dia.com>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org, Karol Herbst <kherbst@...hat.com>,
Lyude Paul <lyude@...hat.com>, Danilo Krummrich <dakr@...nel.org>,
David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
Jérôme Glisse <jglisse@...hat.com>,
Shuah Khan <shuah@...nel.org>, David Hildenbrand <david@...hat.com>,
Barry Song <baohua@...nel.org>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
Ryan Roberts <ryan.roberts@....com>, Matthew Wilcox <willy@...radead.org>,
Peter Xu <peterx@...hat.com>, Kefeng Wang <wangkefeng.wang@...wei.com>,
Jane Chu <jane.chu@...cle.com>, Alistair Popple <apopple@...dia.com>,
Donet Tom <donettom@...ux.ibm.com>
Subject: Re: [v1 resend 00/12] THP support for zone device page migration
On 7/5/25 02:16, Zi Yan wrote:
> On 3 Jul 2025, at 19:34, Balbir Singh wrote:
>
>> This patch series adds support for THP migration of zone device pages.
>> To do so, the patches implement support for folio zone device pages
>> by adding support for setting up larger order pages.
>>
>> These patches build on the earlier posts by Ralph Campbell [1]
>>
>> Two new flags are added in vma_migration to select and mark compound pages.
>> migrate_vma_setup(), migrate_vma_pages() and migrate_vma_finalize()
>> support migration of these pages when MIGRATE_VMA_SELECT_COMPOUND
>> is passed in as arguments.
>>
>> The series also adds zone device awareness to (m)THP pages along
>> with fault handling of large zone device private pages. page vma walk
>> and the rmap code is also zone device aware. Support has also been
>> added for folios that might need to be split in the middle
>> of migration (when the src and dst do not agree on
>> MIGRATE_PFN_COMPOUND), that occurs when src side of the migration can
>> migrate large pages, but the destination has not been able to allocate
>> large pages. The code supported and used folio_split() when migrating
>> THP pages, this is used when MIGRATE_VMA_SELECT_COMPOUND is not passed
>> as an argument to migrate_vma_setup().
>>
>> The test infrastructure lib/test_hmm.c has been enhanced to support THP
>> migration. A new ioctl to emulate failure of large page allocations has
>> been added to test the folio split code path. hmm-tests.c has new test
>> cases for huge page migration and to test the folio split path. A new
>> throughput test has been added as well.
>>
>> The nouveau dmem code has been enhanced to use the new THP migration
>> capability.
>>
>> Feedback from the RFC [2]:
>>
>> It was advised that prep_compound_page() not be exposed just for the purposes
>> of testing (test driver lib/test_hmm.c). Work arounds of copy and split the
>> folios did not work due to lock order dependency in the callback for
>> split folio.
>>
>> mTHP support:
>>
>> The patches hard code, HPAGE_PMD_NR in a few places, but the code has
>> been kept generic to support various order sizes. With additional
>> refactoring of the code support of different order sizes should be
>> possible.
>>
>> The future plan is to post enhancements to support mTHP with a rough
>> design as follows:
>>
>> 1. Add the notion of allowable thp orders to the HMM based test driver
>> 2. For non PMD based THP paths in migrate_device.c, check to see if
>> a suitable order is found and supported by the driver
>> 3. Iterate across orders to check the highest supported order for migration
>> 4. Migrate and finalize
>>
>> The mTHP patches can be built on top of this series, the key design elements
>> that need to be worked out are infrastructure and driver support for multiple
>> ordered pages and their migration.
>
> To help me better review the patches, can you tell me if my mental model below
> for device private folios is correct or not?
>
> 1. device private folios represent device memory, but the associated PFNs
> do not exist in the system. folio->pgmap contains the meta info about
> device memory.
Yes that is right
>
> 2. when data is migrated from system memory to device private memory, a device
> private page table entry is established in place of the original entry.
> A device private page table entry is a swap entry with a device private type.
> And the swap entry points to a device private folio in which the data resides
> in the device private memory.
>
Yes
> 3. when CPU tries to access an address with device private page table entry,
> a fault happens and data is migrated from device private memory to system
> memory. The device private folio pointed by the device private page table
> entry tells driver where to look for the data on the device.
>
> 4. one of the reasons causing a large device private folio split is that
> when a large device private folio is migrated back to system memory and
> there is no free large folio in system memory. So that driver splits
> the large device private folio and only migrate a subpage instead.
>
Both the points are correct, to add to point 4, it can so happen that during
migration of memory from system to device, we might need a split as well.
Effectively the destination is unable to allocate a large page for migration
Balbir
Powered by blists - more mailing lists