[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250605051131.GA3407065@tiffany>
Date: Thu, 5 Jun 2025 14:11:31 +0900
From: Hyesoo Yu <hyesoo.yu@...sung.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: janghyuck.kim@...sung.com, zhaoyang.huang@...soc.com,
jaewon31.kim@...il.com, david@...hat.com, Jason Gunthorpe <jgg@...pe.ca>,
John Hubbard <jhubbard@...dia.com>, Peter Xu <peterx@...hat.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/2] mm: gup: avoid CMA page pinning by retrying
migration if no migratable page
On Wed, Jun 04, 2025 at 08:43:23PM -0700, Andrew Morton wrote:
> On Thu, 5 Jun 2025 12:32:07 +0900 Hyesoo Yu <hyesoo.yu@...sung.com> wrote:
>
> > Commit 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked")
> > introduced an issue where CMA pages could be pinned by longterm GUP requests.
> > This occurs when unpinnable pages are detected but the movable_page_list is empty;
> > the commit would return success without retrying, allowing unpinnable
> > pages (such as CMA) to become pinned.
> >
> > CMA pages may be temporarily off the LRU due to concurrent isolation,
> > for example when multiple longterm GUP requests are racing and therefore
> > not appear in movable_page_list. Before commit 1aaf8c, the kernel would
> > retry migration in such cases, which helped avoid accidental CMA pinning.
> >
> > The original intent of the commit was to support longterm GUP on non-LRU
> > CMA pages in out-of-tree use cases such as pKVM. However, allowing this
> > can lead to broader CMA pinning issues.
> >
> > To avoid this, the logic is restored to return -EAGAIN instead of success
> > when no folios could be collected but unpinnable pages were found.
> > This ensures that migration is retried until success, and avoids
> > inadvertently pinning unpinnable pages.
> >
> > Fixes: 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked")
>
> v6.14.
>
> As ever, a question is "should we backport this fix". To answer that
> we should understand the effect the regression has upon our users.
> Readers can guess, but it's better if you tell us this, please?
>
Hi Andrew.
We have confirmed that this regression causes CMA pages to be pinned
in our kernel 6.12-based environment.
In addition to CMA allocation failures, we also observed GUP longterm
failures in cases where the same VMA was accessed repeatedly.
Specifically, the first GUP longterm call would pin a CMA page, and a second
call on the same region would fail the migration due to the cma page already
being pinned.
After reverting commit 1aaf8c122918, the issue no longer reproduced.
Therefore, this fix is important to ensure reliable behavior of GUP longterm
and CMA-backed memory, and should be backported to stable.
Thanks,
Regards.
>
>
Powered by blists - more mailing lists