lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9eac8a3f-08c2-41f5-a468-1fe5c00a046c@redhat.com>
Date: Thu, 5 Jun 2025 11:13:39 +0200
From: David Hildenbrand <david@...hat.com>
To: Hyesoo Yu <hyesoo.yu@...sung.com>, janghyuck.kim@...sung.com,
 zhaoyang.huang@...soc.com, jaewon31.kim@...il.com,
 Andrew Morton <akpm@...ux-foundation.org>, Jason Gunthorpe <jgg@...pe.ca>,
 John Hubbard <jhubbard@...dia.com>, Peter Xu <peterx@...hat.com>,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] mm: gup: avoid CMA page pinning by retrying
 migration if no migratable page

On 05.06.25 10:39, Hyesoo Yu wrote:
> On Thu, Jun 05, 2025 at 05:04:31PM +0900, Hyesoo Yu wrote:
>> Commit 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked")
>> introduced an issue where CMA pages could be pinned by longterm GUP requests.
>> This occurs when unpinnable pages are detected but the movable_page_list is empty;
>> the commit would return success without retrying, allowing unpinnable
>> pages (such as CMA) to become pinned.
>>
>> CMA pages may be temporarily off the LRU due to concurrent isolation,
>> for example when multiple longterm GUP requests are racing and therefore
>> not appear in movable_page_list. Before commit 1aaf8c, the kernel would
>> retry migration in such cases, which helped avoid accidental CMA pinning.
>>
>> The original intent of the commit was to support longterm GUP on non-LRU
>> CMA pages in out-of-tree use cases such as pKVM. However, allowing this
>> can lead to broader CMA pinning issues.
>>
>> To avoid this, the logic is restored to return -EAGAIN instead of success
>> when no folios could be collected but unpinnable pages were found.
>> This ensures that migration is retried until success, and avoids
>> inadvertently pinning unpinnable pages.
>>
>> Fixes: 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked")
>> Acked-by: David Hildenbrand <david@...hat.com>
>> Signed-off-by: Hyesoo Yu <hyesoo.yu@...sung.com>
>>
>> ---
>> We have confirmed that this regression causes CMA pages to be pinned
>> in our kernel 6.12-based environment.
>>
>> In addition to CMA allocation failures, we also observed longterm GUP failures
>> when repeatedly accessing the same VMA. Specifically, the first longterm GUP
>> call would pin a CMA page, and a second call on the same region
>> would fail the migration because the cma page was already pinned.
>>
>> After reverting commit 1aaf8c122918, the issue no longer reproduced.
>>
>> Therefore, this fix is important to ensure reliable behavior of longterm GUP
>> and CMA-backed memory, and should be backported to stable.
>> ---
>>   mm/gup.c | 28 ++++++++++++++++++++++------
>>   1 file changed, 22 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index e065a49842a8..66193421c1d4 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -2300,14 +2300,12 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>>   		unpin_user_pages(pofs->pages, pofs->nr_entries);
>>   }
>>   
>> -/*
>> - * Returns the number of collected folios. Return value is always >= 0.
>> - */
>> -static void collect_longterm_unpinnable_folios(
>> +static bool collect_longterm_unpinnable_folios(
>>   		struct list_head *movable_folio_list,
>>   		struct pages_or_folios *pofs)
>>   {
>>   	struct folio *prev_folio = NULL;
>> +	bool any_unpinnable = false;
>>   	bool drain_allow = true;
>>   	unsigned long i;
>>   
>> @@ -2321,6 +2319,8 @@ static void collect_longterm_unpinnable_folios(
>>   		if (folio_is_longterm_pinnable(folio))
>>   			continue;
>>   
>> +		any_unpinnable = true;
>> +
>>   		if (folio_is_device_coherent(folio))
>>   			continue;
>>
>> @@ -2342,6 +2342,8 @@ static void collect_longterm_unpinnable_folios(
>>   				    NR_ISOLATED_ANON + folio_is_file_lru(folio),
>>   				    folio_nr_pages(folio));
>>   	}
>> +
>> +	return any_unpinnable;
>>   }
>>   
>>   /*
>> @@ -2417,11 +2419,25 @@ migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
>>   static long
>>   check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>>   {
>> +	bool any_unpinnable;
>> +
>>   	LIST_HEAD(movable_folio_list);
>>   
>> -	collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>> -	if (list_empty(&movable_folio_list))
>> +	any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>> +
> 
> Hi David,
> 
> While re-reviewing the v3 patch, I realized that migrate_longterm_unpinnable_folios()
> should always be called whenever unpinnable folios are present, regardless of whether
> the movable_folio_list is empty.
 > > In collect_longterm_unpinnable_folios(), if 
folio_is_device_coherent() is true,
> the folio is not added to movable_folio_list. However, such device-coherent folios
> still need to be migrated later in migrate_longterm_unpinnable_folios().

Ohh, because we cannot isolate them ... and they are always 
longterm-unpinnable.

> 
> I think the condition `list_empty(&movable_folio_list)` should be removed,
> and it might be better to revert commit 1aaf8c122918 rather than adding a new patch.
> 
> What do you think?

Yeah, with that in mind, a revert might indeed be the better option.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ