[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA1CXcAhj3UKqjGLVCOwGmUY=eDJSnvMZO+byF9b6GJUR9gRiQ@mail.gmail.com>
Date: Wed, 26 Nov 2025 16:16:05 -0700
From: Nico Pache <npache@...hat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
linux-mm@...ck.org, linux-doc@...r.kernel.org, david@...hat.com,
ziy@...dia.com, baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com,
ryan.roberts@....com, dev.jain@....com, corbet@....net, rostedt@...dmis.org,
mhiramat@...nel.org, mathieu.desnoyers@...icios.com,
akpm@...ux-foundation.org, baohua@...nel.org, willy@...radead.org,
peterx@...hat.com, wangkefeng.wang@...wei.com, usamaarif642@...il.com,
sunnanyong@...wei.com, vishal.moola@...il.com,
thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com, kas@...nel.org,
aarcange@...hat.com, raquini@...hat.com, anshuman.khandual@....com,
catalin.marinas@....com, tiwai@...e.de, will@...nel.org,
dave.hansen@...ux.intel.com, jack@...e.cz, cl@...two.org, jglisse@...gle.com,
surenb@...gle.com, zokeefe@...gle.com, hannes@...xchg.org,
rientjes@...gle.com, mhocko@...e.com, rdunlap@...radead.org, hughd@...gle.com,
richard.weiyang@...il.com, lance.yang@...ux.dev, vbabka@...e.cz,
rppt@...nel.org, jannh@...gle.com, pfalcato@...e.de
Subject: Re: [PATCH v12 mm-new 13/15] khugepaged: avoid unnecessary mTHP
collapse attempts
On Wed, Nov 19, 2025 at 5:06 AM Lorenzo Stoakes
<lorenzo.stoakes@...cle.com> wrote:
>
> On Wed, Oct 22, 2025 at 12:37:15PM -0600, Nico Pache wrote:
> > There are cases where, if an attempted collapse fails, all subsequent
> > orders are guaranteed to also fail. Avoid these collapse attempts by
> > bailing out early.
> >
> > Signed-off-by: Nico Pache <npache@...hat.com>
> > ---
> > mm/khugepaged.c | 31 ++++++++++++++++++++++++++++++-
> > 1 file changed, 30 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index e2319bfd0065..54f5c7888e46 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1431,10 +1431,39 @@ static int collapse_scan_bitmap(struct mm_struct *mm, unsigned long address,
> > ret = collapse_huge_page(mm, address, referenced,
> > unmapped, cc, mmap_locked,
> > order, offset);
> > - if (ret == SCAN_SUCCEED) {
> > +
> > + /*
> > + * Analyze failure reason to determine next action:
> > + * - goto next_order: try smaller orders in same region
> > + * - continue: try other regions at same order
>
> The stack is a DFS, so this isn't correct, you may have pushed a bunch of higher
> order candidate mTHPs (I don't like plain 'region') which you will also true.
Ah yeah so it should just be try other "regions" or in this case we
want something like "try to collapse another mTHP candidate in the
stack"
>
> > + * - break: stop all attempts (system-wide failure)
> > + */
>
> This comment isn't hugely helpful, just put the relevant comments next to each
> of the goto, continue, break (soon to be return re: review below) statements
> please.
ack
>
> > + switch (ret) {
> > + /* Cases were we should continue to the next region */
> > + case SCAN_SUCCEED:
> > collapsed += 1UL << order;
> > + fallthrough;
> > + case SCAN_PTE_MAPPED_HUGEPAGE:
> > continue;
>
> Would we not run into trouble potentially in the previous patch's implementation
> of this examing candidate mTHPs that are part of an already existing huge page,
> or would a folio check in the collapse just make this wasted work?
>
> > + /* Cases were lower orders might still succeed */
> > + case SCAN_LACK_REFERENCED_PAGE:
> > + case SCAN_EXCEED_NONE_PTE:
>
> How can we, having checked the max_pte_none, still fail due to this?
There are two phases in the khugepaged code, scan and collapse. in
between them is an alloc which requires dropping the lock, and
reconfirming values (in the collapse phase) after relocking.
During this time, the state of the PMD range might have changed and
our thresholds may have been exceeded.
This was true for PMD collapse and holds true for mTHP collapse too.
>
> > + case SCAN_EXCEED_SWAP_PTE:
> > + case SCAN_EXCEED_SHARED_PTE:
> > + case SCAN_PAGE_LOCK:
> > + case SCAN_PAGE_COUNT:
> > + case SCAN_PAGE_LRU:
> > + case SCAN_PAGE_NULL:
> > + case SCAN_DEL_PAGE_LRU:
> > + case SCAN_PTE_NON_PRESENT:
> > + case SCAN_PTE_UFFD_WP:
> > + case SCAN_ALLOC_HUGE_PAGE_FAIL:
> > + goto next_order;
> > + /* All other cases should stop collapse attempts */
>
> I don't love us having a catch-all, plase spell out all cases.
Ok sounds good, quick question, do we spell out ALL the enums or just
the ones that are reachable from here?
>
> > + default:
> > + break;
> > }
> > + break;
>
> _Hate_ this double break. Just return collapsed please.
ack, yeah that's much better. Thanks!
-- Nico
>
> > }
> >
> > next_order:
> > --
> > 2.51.0
> >
>
Powered by blists - more mailing lists