[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250605083916.GA3770753@tiffany>
Date: Thu, 5 Jun 2025 17:39:16 +0900
From: Hyesoo Yu <hyesoo.yu@...sung.com>
To: janghyuck.kim@...sung.com, zhaoyang.huang@...soc.com,
jaewon31.kim@...il.com, david@...hat.com, Andrew Morton
<akpm@...ux-foundation.org>, Jason Gunthorpe <jgg@...pe.ca>, John Hubbard
<jhubbard@...dia.com>, Peter Xu <peterx@...hat.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] mm: gup: avoid CMA page pinning by retrying
migration if no migratable page
On Thu, Jun 05, 2025 at 05:04:31PM +0900, Hyesoo Yu wrote:
> Commit 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked")
> introduced an issue where CMA pages could be pinned by longterm GUP requests.
> This occurs when unpinnable pages are detected but the movable_page_list is empty;
> the commit would return success without retrying, allowing unpinnable
> pages (such as CMA) to become pinned.
>
> CMA pages may be temporarily off the LRU due to concurrent isolation,
> for example when multiple longterm GUP requests are racing and therefore
> not appear in movable_page_list. Before commit 1aaf8c, the kernel would
> retry migration in such cases, which helped avoid accidental CMA pinning.
>
> The original intent of the commit was to support longterm GUP on non-LRU
> CMA pages in out-of-tree use cases such as pKVM. However, allowing this
> can lead to broader CMA pinning issues.
>
> To avoid this, the logic is restored to return -EAGAIN instead of success
> when no folios could be collected but unpinnable pages were found.
> This ensures that migration is retried until success, and avoids
> inadvertently pinning unpinnable pages.
>
> Fixes: 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked")
> Acked-by: David Hildenbrand <david@...hat.com>
> Signed-off-by: Hyesoo Yu <hyesoo.yu@...sung.com>
>
> ---
> We have confirmed that this regression causes CMA pages to be pinned
> in our kernel 6.12-based environment.
>
> In addition to CMA allocation failures, we also observed longterm GUP failures
> when repeatedly accessing the same VMA. Specifically, the first longterm GUP
> call would pin a CMA page, and a second call on the same region
> would fail the migration because the cma page was already pinned.
>
> After reverting commit 1aaf8c122918, the issue no longer reproduced.
>
> Therefore, this fix is important to ensure reliable behavior of longterm GUP
> and CMA-backed memory, and should be backported to stable.
> ---
> mm/gup.c | 28 ++++++++++++++++++++++------
> 1 file changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index e065a49842a8..66193421c1d4 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2300,14 +2300,12 @@ static void pofs_unpin(struct pages_or_folios *pofs)
> unpin_user_pages(pofs->pages, pofs->nr_entries);
> }
>
> -/*
> - * Returns the number of collected folios. Return value is always >= 0.
> - */
> -static void collect_longterm_unpinnable_folios(
> +static bool collect_longterm_unpinnable_folios(
> struct list_head *movable_folio_list,
> struct pages_or_folios *pofs)
> {
> struct folio *prev_folio = NULL;
> + bool any_unpinnable = false;
> bool drain_allow = true;
> unsigned long i;
>
> @@ -2321,6 +2319,8 @@ static void collect_longterm_unpinnable_folios(
> if (folio_is_longterm_pinnable(folio))
> continue;
>
> + any_unpinnable = true;
> +
> if (folio_is_device_coherent(folio))
> continue;
>
> @@ -2342,6 +2342,8 @@ static void collect_longterm_unpinnable_folios(
> NR_ISOLATED_ANON + folio_is_file_lru(folio),
> folio_nr_pages(folio));
> }
> +
> + return any_unpinnable;
> }
>
> /*
> @@ -2417,11 +2419,25 @@ migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
> static long
> check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> {
> + bool any_unpinnable;
> +
> LIST_HEAD(movable_folio_list);
>
> - collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> - if (list_empty(&movable_folio_list))
> + any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> +
Hi David,
While re-reviewing the v3 patch, I realized that migrate_longterm_unpinnable_folios()
should always be called whenever unpinnable folios are present, regardless of whether
the movable_folio_list is empty.
In collect_longterm_unpinnable_folios(), if folio_is_device_coherent() is true,
the folio is not added to movable_folio_list. However, such device-coherent folios
still need to be migrated later in migrate_longterm_unpinnable_folios().
I think the condition `list_empty(&movable_folio_list)` should be removed,
and it might be better to revert commit 1aaf8c122918 rather than adding a new patch.
What do you think?
Thanks,
Regards.
> + if (list_empty(&movable_folio_list)) {
> + /*
> + * If we find any longterm unpinnable page that we failed to
> + * isolated for migration, it might be because someone else
> + * concurrently isolated it. Make the caller retry until it
> + * succeeds.
> + */
> + if (any_unpinnable) {
> + pofs_unpin(pofs);
> + return -EAGAIN;
> + }
> return 0;
> + }
>
> return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
> }
> --
> 2.49.0
>
>
Powered by blists - more mailing lists