linux-ext4 - Re: [PATCH v2 03/11] mm/gup: migrate PIN_LONGTERM dev coherent pages to system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4ceda6e1-76c4-d92a-e4a3-a78d08620058@amd.com>
Date:   Fri, 10 Dec 2021 11:39:39 -0500
From:   Felix Kuehling <felix.kuehling@....com>
To:     Alistair Popple <apopple@...dia.com>, akpm@...ux-foundation.org,
        linux-mm@...ck.org, rcampbell@...dia.com,
        linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org,
        "Sierra Guiza, Alejandro (Alex)" <alex.sierra@....com>
Cc:     amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
        jglisse@...hat.com, willy@...radead.org, jgg@...dia.com, hch@....de
Subject: Re: [PATCH v2 03/11] mm/gup: migrate PIN_LONGTERM dev coherent pages
 to system

On 2021-12-09 8:31 p.m., Alistair Popple wrote:
> On Friday, 10 December 2021 3:54:31 AM AEDT Sierra Guiza, Alejandro (Alex) wrote:
>> On 12/9/2021 10:29 AM, Felix Kuehling wrote:
>>> Am 2021-12-09 um 5:53 a.m. schrieb Alistair Popple:
>>>> On Thursday, 9 December 2021 5:55:26 AM AEDT Sierra Guiza, Alejandro (Alex) wrote:
>>>>> On 12/8/2021 11:30 AM, Felix Kuehling wrote:
>>>>>> Am 2021-12-08 um 11:58 a.m. schrieb Felix Kuehling:
>>>>>>> Am 2021-12-08 um 6:31 a.m. schrieb Alistair Popple:
>>>>>>>> On Tuesday, 7 December 2021 5:52:43 AM AEDT Alex Sierra wrote:
>>>>>>>>> Avoid long term pinning for Coherent device type pages. This could
>>>>>>>>> interfere with their own device memory manager.
>>>>>>>>> If caller tries to get user device coherent pages with PIN_LONGTERM flag
>>>>>>>>> set, those pages will be migrated back to system memory.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Alex Sierra<alex.sierra@....com>
>>>>>>>>> ---
>>>>>>>>>     mm/gup.c | 32 ++++++++++++++++++++++++++++++--
>>>>>>>>>     1 file changed, 30 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>>>>>>> index 886d6148d3d0..1572eacf07f4 100644
>>>>>>>>> --- a/mm/gup.c
>>>>>>>>> +++ b/mm/gup.c
>>>>>>>>> @@ -1689,17 +1689,37 @@ struct page *get_dump_page(unsigned long addr)
>>>>>>>>>     #endif /* CONFIG_ELF_CORE */
>>>>>>>>>     
>>>>>>>>>     #ifdef CONFIG_MIGRATION
>>>>>>>>> +static int migrate_device_page(unsigned long address,
>>>>>>>>> +				struct page *page)
>>>>>>>>> +{
>>>>>>>>> +	struct vm_area_struct *vma = find_vma(current->mm, address);
>>>>>>>>> +	struct vm_fault vmf = {
>>>>>>>>> +		.vma = vma,
>>>>>>>>> +		.address = address & PAGE_MASK,
>>>>>>>>> +		.flags = FAULT_FLAG_USER,
>>>>>>>>> +		.pgoff = linear_page_index(vma, address),
>>>>>>>>> +		.gfp_mask = GFP_KERNEL,
>>>>>>>>> +		.page = page,
>>>>>>>>> +	};
>>>>>>>>> +	if (page->pgmap && page->pgmap->ops->migrate_to_ram)
>>>>>>>>> +		return page->pgmap->ops->migrate_to_ram(&vmf);
>>>>>>>> How does this synchronise against pgmap being released? As I understand things
>>>>>>>> at this point we're not holding a reference on either the page or pgmap, so
>>>>>>>> the page and therefore the pgmap may have been freed.
>>>>>>>>
>>>>>>>> I think a similar problem exists for device private fault handling as well and
>>>>>>>> it has been on my list of things to fix for a while. I think the solution is to
>>>>>>>> call try_get_page(), except it doesn't work with device pages due to the whole
>>>>>>>> refcount thing. That issue is blocking a fair bit of work now so I've started
>>>>>>>> looking into it.
>>>>>>> At least the page should have been pinned by the __get_user_pages_locked
>>>>>>> call in __gup_longterm_locked. That refcount is dropped in
>>>>>>> check_and_migrate_movable_pages when it returns 0 or an error.
>>>>>> Never mind. We unpin the pages first. Alex, would the migration work if
>>>>>> we unpinned them afterwards? Also, the normal CPU page fault code path
>>>>>> seems to make sure the page is locked (check in pfn_swap_entry_to_page)
>>>>>> before calling migrate_to_ram.
>>>> I don't think that's true. The check in pfn_swap_entry_to_page() is only for
>>>> migration entries:
>>>>
>>>> 	BUG_ON(is_migration_entry(entry) && !PageLocked(p));
>>>>
>>>> As this is coherent memory though why do we have to call into a device driver
>>>> to do the migration? Couldn't this all be done in the kernel?
>>> I think you're right. I hadn't thought of that mainly because I'm even
>>> less familiar with the non-device migration code. Alex, can you give
>>> that a try? As long as the driver still gets a page-free callback when
>>> the device page is freed, it should work.
> Yes, you should still get the page-free callback when the migration code drops
> the last page reference.
>
>> ACK.Will do
> There is currently not really any support for migrating device pages based on
> pfn. What I think is needed is something like migrate_pages(), but that API
> won't work for a couple of reasons - main one being that it relies on pages
> being LRU pages.
>
> I've been working on a series to implement an equivalent of migrate_pages() for
> device-private (and by extension device-coherent) pages. It might also be useful
> here so I will try and get it posted as an RFC next week.
If we want to make progress on this patch series in the shorter term, we 
could just fail get_user_pages with FOLL_LONGTERM for DEVICE_COHERENT 
pages. Then add the migration support when your patch series is ready.

Regards,
   Felix


>
>   - Alistair
>
>> Alex Sierra
>>
>>> Regards,
>>>     Felix
>>>
>>>
>>>>> No, you can not unpinned after migration. Due to the expected_count VS
>>>>> page_count condition at migrate_page_move_mapping, during migrate_page call.
>>>>>
>>>>> Regards,
>>>>> Alex Sierra
>>>>>
>>>>>> Regards,
>>>>>>      Felix
>>>>>>
>>>>>>
>
>