lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250605083916.GA3770753@tiffany>
Date: Thu, 5 Jun 2025 17:39:16 +0900
From: Hyesoo Yu <hyesoo.yu@...sung.com>
To: janghyuck.kim@...sung.com, zhaoyang.huang@...soc.com,
	jaewon31.kim@...il.com, david@...hat.com, Andrew Morton
	<akpm@...ux-foundation.org>, Jason Gunthorpe <jgg@...pe.ca>, John Hubbard
	<jhubbard@...dia.com>, Peter Xu <peterx@...hat.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] mm: gup: avoid CMA page pinning by retrying
 migration if no migratable page

On Thu, Jun 05, 2025 at 05:04:31PM +0900, Hyesoo Yu wrote:
> Commit 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked")
> introduced an issue where CMA pages could be pinned by longterm GUP requests.
> This occurs when unpinnable pages are detected but the movable_page_list is empty;
> the commit would return success without retrying, allowing unpinnable
> pages (such as CMA) to become pinned.
> 
> CMA pages may be temporarily off the LRU due to concurrent isolation,
> for example when multiple longterm GUP requests are racing and therefore
> not appear in movable_page_list. Before commit 1aaf8c, the kernel would
> retry migration in such cases, which helped avoid accidental CMA pinning.
> 
> The original intent of the commit was to support longterm GUP on non-LRU
> CMA pages in out-of-tree use cases such as pKVM. However, allowing this
> can lead to broader CMA pinning issues.
> 
> To avoid this, the logic is restored to return -EAGAIN instead of success
> when no folios could be collected but unpinnable pages were found.
> This ensures that migration is retried until success, and avoids
> inadvertently pinning unpinnable pages.
> 
> Fixes: 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked")
> Acked-by: David Hildenbrand <david@...hat.com>
> Signed-off-by: Hyesoo Yu <hyesoo.yu@...sung.com>
> 
> ---
> We have confirmed that this regression causes CMA pages to be pinned
> in our kernel 6.12-based environment.
> 
> In addition to CMA allocation failures, we also observed longterm GUP failures
> when repeatedly accessing the same VMA. Specifically, the first longterm GUP
> call would pin a CMA page, and a second call on the same region
> would fail the migration because the cma page was already pinned.
> 
> After reverting commit 1aaf8c122918, the issue no longer reproduced.
> 
> Therefore, this fix is important to ensure reliable behavior of longterm GUP
> and CMA-backed memory, and should be backported to stable.
> ---
>  mm/gup.c | 28 ++++++++++++++++++++++------
>  1 file changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index e065a49842a8..66193421c1d4 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2300,14 +2300,12 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>  		unpin_user_pages(pofs->pages, pofs->nr_entries);
>  }
>  
> -/*
> - * Returns the number of collected folios. Return value is always >= 0.
> - */
> -static void collect_longterm_unpinnable_folios(
> +static bool collect_longterm_unpinnable_folios(
>  		struct list_head *movable_folio_list,
>  		struct pages_or_folios *pofs)
>  {
>  	struct folio *prev_folio = NULL;
> +	bool any_unpinnable = false;
>  	bool drain_allow = true;
>  	unsigned long i;
>  
> @@ -2321,6 +2319,8 @@ static void collect_longterm_unpinnable_folios(
>  		if (folio_is_longterm_pinnable(folio))
>  			continue;
>  
> +		any_unpinnable = true;
> +
>  		if (folio_is_device_coherent(folio))
>  			continue;
>
> @@ -2342,6 +2342,8 @@ static void collect_longterm_unpinnable_folios(
>  				    NR_ISOLATED_ANON + folio_is_file_lru(folio),
>  				    folio_nr_pages(folio));
>  	}
> +
> +	return any_unpinnable;
>  }
>  
>  /*
> @@ -2417,11 +2419,25 @@ migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
>  static long
>  check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>  {
> +	bool any_unpinnable;
> +
>  	LIST_HEAD(movable_folio_list);
>  
> -	collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> -	if (list_empty(&movable_folio_list))
> +	any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> +

Hi David,

While re-reviewing the v3 patch, I realized that migrate_longterm_unpinnable_folios()
should always be called whenever unpinnable folios are present, regardless of whether
the movable_folio_list is empty.

In collect_longterm_unpinnable_folios(), if folio_is_device_coherent() is true,
the folio is not added to movable_folio_list. However, such device-coherent folios
still need to be migrated later in migrate_longterm_unpinnable_folios().

I think the condition `list_empty(&movable_folio_list)` should be removed,
and it might be better to revert commit 1aaf8c122918 rather than adding a new patch.

What do you think?

Thanks,
Regards.

> +	if (list_empty(&movable_folio_list)) {
> +		/*
> +		 * If we find any longterm unpinnable page that we failed to
> +		 * isolated for migration, it might be because someone else
> +		 * concurrently isolated it. Make the caller retry until it
> +		 * succeeds.
> +		 */
> +		if (any_unpinnable) {
> +			pofs_unpin(pofs);
> +			return -EAGAIN;
> +		}
>  		return 0;
> +	}
>  
>  	return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
>  }
> -- 
> 2.49.0
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ