[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local>
Date: Thu, 18 Sep 2025 11:06:12 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Lance Yang <lance.yang@...ux.dev>
Cc: Dev Jain <dev.jain@....com>, akpm@...ux-foundation.org, david@...hat.com,
ziy@...dia.com, baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com,
npache@...hat.com, ryan.roberts@....com, baohua@...nel.org,
ioworker0@...il.com, kirill@...temov.name, hughd@...gle.com,
mpenttil@...hat.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH mm-new v2 2/2] mm/khugepaged: abort collapse scan on
guard PTEs
On Thu, Sep 18, 2025 at 04:11:21PM +0800, Lance Yang wrote:
>
>
> On 2025/9/18 15:37, Dev Jain wrote:
> >
> > On 18/09/25 10:34 am, Lance Yang wrote:
> > > From: Lance Yang <lance.yang@...ux.dev>
> > >
> > > Guard PTE markers are installed via MADV_GUARD_INSTALL to create
> > > lightweight guard regions.
> > >
> > > Currently, any collapse path (khugepaged or MADV_COLLAPSE) will fail when
> > > encountering such a range.
> > >
> > > MADV_COLLAPSE fails deep inside the collapse logic when trying to swap-in
> > > the special marker in __collapse_huge_page_swapin().
> > >
> > > hpage_collapse_scan_pmd()
> > > `- collapse_huge_page()
> > > `- __collapse_huge_page_swapin() -> fails!
> > >
> > > khugepaged's behavior is slightly different due to its max_ptes_swap
> > > limit
> > > (default 64). It won't fail as deep, but it will still needlessly scan up
> > > to 64 swap entries before bailing out.
> > >
> > > IMHO, we can and should detect this much earlier.
> > >
> > > This patch adds a check directly inside the PTE scan loop. If a guard
> > > marker is found, the scan is aborted immediately with
> > > SCAN_PTE_NON_PRESENT,
> > > avoiding wasted work.
> > >
> > > Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > > Signed-off-by: Lance Yang <lance.yang@...ux.dev>
> > > ---
> > > mm/khugepaged.c | 10 ++++++++++
> > > 1 file changed, 10 insertions(+)
> > >
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index 9ed1af2b5c38..70ebfc7c1f3e 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -1306,6 +1306,16 @@ static int hpage_collapse_scan_pmd(struct
> > > mm_struct *mm,
> > > result = SCAN_PTE_UFFD_WP;
> > > goto out_unmap;
> > > }
> > > + /*
> > > + * Guard PTE markers are installed by
> > > + * MADV_GUARD_INSTALL. Any collapse path must
> > > + * not touch them, so abort the scan immediately
> > > + * if one is found.
> > > + */
> > > + if (is_guard_pte_marker(pteval)) {
> > > + result = SCAN_PTE_NON_PRESENT;
> > > + goto out_unmap;
> > > + }
> > > continue;
> > > } else {
> > > result = SCAN_EXCEED_SWAP_PTE;
> > >
> > >
> >
> > I would like to hear everyone else's thoughts on
> > https://lore.kernel.org/linux-mm/750a06dc-db3d-43c6-
> > b234-95efb393a9df@....com/
> > wherein I suggest that we should not continue to try collapsing other
> > regions
> > but immediately exit. The SCAN_PTE_NON_PRESENT case does not exit.
>
> Yes! Let's hear what other folks think on that[1].
>
> [1] https://lore.kernel.org/linux-mm/c9d4d761-202f-48ce-8e3d-fb9075671ff3@linux.dev
Since the code has changed let's discuss on this thread.
Dev - You can have guard regions in a range that prevent one PMD from being
collapsed, I'm struggling to understand why you'd want to abort the whole
thing?
Your reasoning there isn't clear at all, so if I had a guard region in one
page in a giant range I was trying to collapse, you're saying we should
just abort the whole thing?
I really don't understand why we would do that? You just skip over what you
can't collapse right?
There's no reason at all to assume that overlapping regions here matter, we
can't predict how users will use this.
As Lance says, it's best effort. And also note we already do this with UFFD
WP. And note this is also a non-present, PTE marker.
And also this would change existing behaviour which treats this as a swap
entry then just fails later down the line right?
So yeah I don't agree, I think it's fine as is, unless I'm missing
something here.
Cheers, Lorenzo
Powered by blists - more mailing lists