[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <07007d31-1601-4b95-b646-687295487cfe@redhat.com>
Date: Mon, 1 Sep 2025 18:15:40 +0200
From: David Hildenbrand <david@...hat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Nico Pache <npache@...hat.com>
Cc: linux-mm@...ck.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
ziy@...dia.com, baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com,
ryan.roberts@....com, dev.jain@....com, corbet@....net, rostedt@...dmis.org,
mhiramat@...nel.org, mathieu.desnoyers@...icios.com,
akpm@...ux-foundation.org, baohua@...nel.org, willy@...radead.org,
peterx@...hat.com, wangkefeng.wang@...wei.com, usamaarif642@...il.com,
sunnanyong@...wei.com, vishal.moola@...il.com,
thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com,
kirill.shutemov@...ux.intel.com, aarcange@...hat.com, raquini@...hat.com,
anshuman.khandual@....com, catalin.marinas@....com, tiwai@...e.de,
will@...nel.org, dave.hansen@...ux.intel.com, jack@...e.cz, cl@...two.org,
jglisse@...gle.com, surenb@...gle.com, zokeefe@...gle.com,
hannes@...xchg.org, rientjes@...gle.com, mhocko@...e.com,
rdunlap@...radead.org, hughd@...gle.com
Subject: Re: [PATCH v10 05/13] khugepaged: generalize __collapse_huge_page_*
for mTHP support
On 20.08.25 16:22, Lorenzo Stoakes wrote:
> On Tue, Aug 19, 2025 at 07:41:57AM -0600, Nico Pache wrote:
>> generalize the order of the __collapse_huge_page_* functions
>> to support future mTHP collapse.
>>
>> mTHP collapse can suffer from incosistant behavior, and memory waste
>> "creep". disable swapin and shared support for mTHP collapse.
I'd just note that mTHP collapse will initially not honor these two
parameters (spell them out), failing if anything is swapped out or shared.
>>
>> No functional changes in this patch.
>>
>> Reviewed-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
>> Acked-by: David Hildenbrand <david@...hat.com>
>> Co-developed-by: Dev Jain <dev.jain@....com>
>> Signed-off-by: Dev Jain <dev.jain@....com>
>> Signed-off-by: Nico Pache <npache@...hat.com>
>> ---
>> mm/khugepaged.c | 62 ++++++++++++++++++++++++++++++++++---------------
>> 1 file changed, 43 insertions(+), 19 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 77e0d8ee59a0..074101d03c9d 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -551,15 +551,17 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>> unsigned long address,
>> pte_t *pte,
>> struct collapse_control *cc,
>> - struct list_head *compound_pagelist)
>> + struct list_head *compound_pagelist,
>> + unsigned int order)
>
> I think it's better if we keep the output var as the last in the order. It's a
> bit weird to have the order specified here.
Also, while at it, just double-tab indent.
>
>> {
>> struct page *page = NULL;
>> struct folio *folio = NULL;
>> pte_t *_pte;
>> int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
>> bool writable = false;
>> + int scaled_max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
>
> This is a weird formulation, I guess we have to go with it to keep things
> consistent-ish, but it's like we have a value for this that is reliant on the
> order always being PMD and then sort of awkwardly adjusting for MTHP.
>
> I guess we're stuck with it though since we have:
>
> /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none
>
> I guess a more sane version of this would be a ratio or something...
Yeah, ratios would have made much more sense ... but people didn't plan
for having something that is not a PMD size.
>
> Anyway probably out of scope here.
>
>>
>> - for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
>> + for (_pte = pte; _pte < pte + (1 << order);
>
> Hmm is this correct? I think shifting an int is probably a bad idea even if we
> can get away with it for even PUD order atm (though... 64KB ARM hm), wouldn't
> _BITUL(order) be better?
Just for completeness: we discussed that recently in other context. It
would better be BIT() instead of _BITUL().
But I am not a fan of using BIT() when not working with bitmaps etc.
"1ul << order" etc is used throughout the code base.
What makes this easier to read is:
const unsigned long nr_pages = 1ul << order;
Maybe in the future we want an ORDER_PAGES(), maybe not. Have not made
up my mind yet :)
--
Cheers
David / dhildenb
Powered by blists - more mailing lists