[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251118020034.rdgisvkqs53lwabz@master>
Date: Tue, 18 Nov 2025 02:00:34 +0000
From: Wei Yang <richard.weiyang@...il.com>
To: Nico Pache <npache@...hat.com>
Cc: Wei Yang <richard.weiyang@...il.com>, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-doc@...r.kernel.org, david@...hat.com, ziy@...dia.com,
baolin.wang@...ux.alibaba.com, lorenzo.stoakes@...cle.com,
Liam.Howlett@...cle.com, ryan.roberts@....com, dev.jain@....com,
corbet@....net, rostedt@...dmis.org, mhiramat@...nel.org,
mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org,
baohua@...nel.org, willy@...radead.org, peterx@...hat.com,
wangkefeng.wang@...wei.com, usamaarif642@...il.com,
sunnanyong@...wei.com, vishal.moola@...il.com,
thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com,
kas@...nel.org, aarcange@...hat.com, raquini@...hat.com,
anshuman.khandual@....com, catalin.marinas@....com, tiwai@...e.de,
will@...nel.org, dave.hansen@...ux.intel.com, jack@...e.cz,
cl@...two.org, jglisse@...gle.com, surenb@...gle.com,
zokeefe@...gle.com, hannes@...xchg.org, rientjes@...gle.com,
mhocko@...e.com, rdunlap@...radead.org, hughd@...gle.com,
lance.yang@...ux.dev, vbabka@...e.cz, rppt@...nel.org,
jannh@...gle.com, pfalcato@...e.de
Subject: Re: [PATCH v12 mm-new 13/15] khugepaged: avoid unnecessary mTHP
collapse attempts
On Mon, Nov 17, 2025 at 11:16:53AM -0700, Nico Pache wrote:
>On Sat, Nov 8, 2025 at 7:40 PM Wei Yang <richard.weiyang@...il.com> wrote:
>>
>> On Wed, Oct 22, 2025 at 12:37:15PM -0600, Nico Pache wrote:
>> >There are cases where, if an attempted collapse fails, all subsequent
>> >orders are guaranteed to also fail. Avoid these collapse attempts by
>> >bailing out early.
>> >
>> >Signed-off-by: Nico Pache <npache@...hat.com>
>> >---
>> > mm/khugepaged.c | 31 ++++++++++++++++++++++++++++++-
>> > 1 file changed, 30 insertions(+), 1 deletion(-)
>> >
>> >diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> >index e2319bfd0065..54f5c7888e46 100644
>> >--- a/mm/khugepaged.c
>> >+++ b/mm/khugepaged.c
>> >@@ -1431,10 +1431,39 @@ static int collapse_scan_bitmap(struct mm_struct *mm, unsigned long address,
>> > ret = collapse_huge_page(mm, address, referenced,
>> > unmapped, cc, mmap_locked,
>> > order, offset);
>> >- if (ret == SCAN_SUCCEED) {
>> >+
>> >+ /*
>> >+ * Analyze failure reason to determine next action:
>> >+ * - goto next_order: try smaller orders in same region
>> >+ * - continue: try other regions at same order
>> >+ * - break: stop all attempts (system-wide failure)
>> >+ */
>> >+ switch (ret) {
>> >+ /* Cases were we should continue to the next region */
>> >+ case SCAN_SUCCEED:
>> > collapsed += 1UL << order;
>> >+ fallthrough;
>> >+ case SCAN_PTE_MAPPED_HUGEPAGE:
>> > continue;
>> >+ /* Cases were lower orders might still succeed */
>> >+ case SCAN_LACK_REFERENCED_PAGE:
>> >+ case SCAN_EXCEED_NONE_PTE:
>> >+ case SCAN_EXCEED_SWAP_PTE:
>> >+ case SCAN_EXCEED_SHARED_PTE:
>> >+ case SCAN_PAGE_LOCK:
>> >+ case SCAN_PAGE_COUNT:
>> >+ case SCAN_PAGE_LRU:
>> >+ case SCAN_PAGE_NULL:
>> >+ case SCAN_DEL_PAGE_LRU:
>> >+ case SCAN_PTE_NON_PRESENT:
>> >+ case SCAN_PTE_UFFD_WP:
>> >+ case SCAN_ALLOC_HUGE_PAGE_FAIL:
>> >+ goto next_order;
>> >+ /* All other cases should stop collapse attempts */
>> >+ default:
>> >+ break;
>> > }
>> >+ break;
>>
>> One question here:
>
>Hi Wei Yang,
>
>Sorry I forgot to get back to this email.
>
No problem, thanks for taking a look.
>>
>> Suppose we have iterated several orders and not collapse successfully yet. So
>> the mthp_bitmap_stack[] would look like this:
>>
>> [8 7 6 6]
>> ^
>> |
>
>so we always pop before pushing. So it would go
>
>[9]
>pop
>if (collapse fails)
>[8 8]
>lets say we pop and successfully collapse a order 8
>[8]
>Then we fail the other order 8
>[7 7]
>now if we succeed the first order 7
>[7 6 6]
>I believe we are now in the state you wanted to describe.
>
>>
>> Now we found this one pass the threshold check, but it fails with other
>> result.
>
>ok lets say we pass the threshold checks, but the collapse fails for
>any reason that is described in the
>/* Cases were lower orders might still succeed */
>In this case we would continue to order 5 (or lower). Once we are done
>with this branch of the tree we go back to the other order 6 collapse.
>and eventually the order 7.
>
>>
>> Current code looks it would give up at all, but we may still have a chance to
>> collapse the above 3 range?
>
>for cases under /* All other cases should stop collapse attempts */
>Yes we would bail out and skip some collapses. I tried to think about
>all the cases were we would still want to continue trying, vs cases
>where the system is probably out of resources or hitting some major
>failure, and we should just break out (as others will probably fail
>too).
>
Thanks, your explanation is very clear.
>But this is also why I separated this patch out on its own. I was
>hoping to have some more focus on the different cases, and make sure I
>handled them in the best possible way. So I really appreciate the
>question :)
>
>* I did some digging through old message to find this *
>
>I believe these are the remaining cases. If these are hit I figured
>it's better to abort.
>
I agree we need to take care of those cases.
>/* cases where we must stop collapse attempts */
>case SCAN_CGROUP_CHARGE_FAIL:
>case SCAN_COPY_MC:
>case SCAN_ADDRESS_RANGE:
>case SCAN_PMD_NULL:
>case SCAN_ANY_PROCESS:
>case SCAN_VMA_NULL:
>case SCAN_VMA_CHECK:
>case SCAN_SCAN_ABORT:
>case SCAN_PMD_NONE:
>case SCAN_PAGE_ANON:
>case SCAN_PMD_MAPPED:
>case SCAN_FAIL:
>
>Please let me know if you think we should move these to either the
>`continue` or `next order` cases.
Take a look into these cases, it looks good to me now.
Also one of my concern is this coding style is a little hard to maintain. In
case we introduce a new result, we should remember to add it here. Otherwise
we may stop the collapse too early.
While it maybe a separate work after this patch set merged.
>
>Cheers,
>-- Nico
>
>>
>> > }
>> >
>> > next_order:
>> >--
>> >2.51.0
>>
>> --
>> Wei Yang
>> Help you, Help me
>>
--
Wei Yang
Help you, Help me
Powered by blists - more mailing lists