[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c629f600-94c9-4cda-990c-83e429a2b9a1@redhat.com>
Date: Fri, 15 Aug 2025 14:44:54 +0300
From: Mika Penttilä <mpenttil@...hat.com>
To: Balbir Singh <balbirs@...dia.com>, linux-mm@...ck.org
Cc: linux-kernel@...r.kernel.org, David Hildenbrand <david@...hat.com>,
Jason Gunthorpe <jgg@...dia.com>, Leon Romanovsky <leonro@...dia.com>,
Alistair Popple <apopple@...dia.com>
Subject: Re: [RFC PATCH 0/4] Migrate on fault for device pages
On 8/15/25 14:36, Balbir Singh wrote:
> On 8/14/25 17:19, Mika Penttilä wrote:
>> As of this writing, the way device page faulting and migration
>> works is not optimal, if you want to do both fault handling
>> and migration at once.
>>
>> Being able to migrate not present pages (or pages mapped with incorrect
>> permissions, eg. COW) to the GPU requires doing either of the following
>> sequences:
>>
>> 1. hmm_range_fault() - fault in non-present pages with correct
>> permissions,etc.
>> 2. migrate_vma_*() - migrate the pages
>>
>> Or:
>>
>> 1. migrate_vma_*() - migrate present pages
>> 2. If non-present pages detected by migrate_vma_*():
>> a) call hmm_range_fault() to fault pages in
>> b) call migrate_vma_*() again to migrate now present pages
>>
>> The problem with the first sequence is that you always have to do two
>> page walks even when most of the time the pages are present or zero page
>> mappings so the common case takes a performance hit.
>>
>> The second sequence is better for the common case, but far worse if
>> pages aren't present because now you have to walk the page tables three
>> times (once to find the page is not present, once so hmm_range_fault()
>> can find a non-present page to fault in and once again to setup the
>> migration). It also tricky to code correctly.
>>
>> We should be able to walk the page table once, faulting
>> pages in as required and replacing them with migration entries if
>> requested.
>>
> The use case makes sense to me, but isn't the sequence always going
> to be racy, by the time the pages are faulted in, there could be
> others that have been marked non-present or do you intend to lock
> all pages during this operation?
>
> Balbir
Yes the pages are "collected", so locked and ref taken as soon as faulted in.
--Mika
>
Powered by blists - more mailing lists