[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c8c5e818-536a-4d72-b8dc-36aeb1b61800@arm.com>
Date: Thu, 28 Aug 2025 16:18:48 +0530
From: Dev Jain <dev.jain@....com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>,
David Hildenbrand <david@...hat.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Nico Pache <npache@...hat.com>, linux-mm@...ck.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, ziy@...dia.com, Liam.Howlett@...cle.com,
ryan.roberts@....com, corbet@....net, rostedt@...dmis.org,
mhiramat@...nel.org, mathieu.desnoyers@...icios.com,
akpm@...ux-foundation.org, baohua@...nel.org, willy@...radead.org,
peterx@...hat.com, wangkefeng.wang@...wei.com, usamaarif642@...il.com,
sunnanyong@...wei.com, vishal.moola@...il.com,
thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com,
kirill.shutemov@...ux.intel.com, aarcange@...hat.com, raquini@...hat.com,
anshuman.khandual@....com, catalin.marinas@....com, tiwai@...e.de,
will@...nel.org, dave.hansen@...ux.intel.com, jack@...e.cz, cl@...two.org,
jglisse@...gle.com, surenb@...gle.com, zokeefe@...gle.com,
hannes@...xchg.org, rientjes@...gle.com, mhocko@...e.com,
rdunlap@...radead.org, hughd@...gle.com
Subject: Re: [PATCH v10 00/13] khugepaged: mTHP support
On 28/08/25 3:16 pm, Baolin Wang wrote:
> (Sorry for chiming in late)
>
> On 2025/8/22 22:10, David Hildenbrand wrote:
>>>> Once could also easily support the value 255 (HPAGE_PMD_NR / 2- 1),
>>>> but not sure
>>>> if we have to add that for now.
>>>
>>> Yeah not so sure about this, this is a 'just have to know' too, and
>>> yes you
>>> might add it to the docs, but people are going to be mightily
>>> confused, esp if
>>> it's a calculated value.
>>>
>>> I don't see any other way around having a separate tunable if we
>>> don't just have
>>> something VERY simple like on/off.
>>
>> Yeah, not advocating that we add support for other values than 0/511,
>> really.
>>
>>>
>>> Also the mentioned issue sounds like something that needs to be
>>> fixed elsewhere
>>> honestly in the algorithm used to figure out mTHP ranges (I may be
>>> wrong - and
>>> happy to stand corrected if this is somehow inherent, but reallly
>>> feels that
>>> way).
>>
>> I think the creep is unavoidable for certain values.
>>
>> If you have the first two pages of a PMD area populated, and you
>> allow for at least half of the #PTEs to be non/zero, you'd collapse
>> first a
>> order-2 folio, then and order-3 ... until you reached PMD order.
>>
>> So for now we really should just support 0 / 511 to say "don't
>> collapse if there are holes" vs. "always collapse if there is at
>> least one pte used".
>
> If we only allow setting 0 or 511, as Nico mentioned before, "At 511,
> no mTHP collapses would ever occur anyway, unless you have 2MB
> disabled and other mTHP sizes enabled. Technically, at 511, only the
> highest enabled order would ever be collapsed."
I didn't understand this statement. At 511, mTHP collapses will occur if
khugepaged cannot get a PMD folio. Our goal is to collapse to the
highest order folio.
>
> In other words, for the scenario you described, although there are
> only 2 PTEs present in a PMD, it would still get collapsed into a
> PMD-sized THP. In reality, what we probably need is just an order-2
> mTHP collapse.
>
> If 'khugepaged_max_ptes_none' is set to 255, I think this would
> achieve the desired result: when there are only 2 PTEs present in a
> PMD, an order-2 mTHP collapse would be successed, but it wouldn’t
> creep up to an order-3 mTHP collapse. That’s because:
> When attempting an order-3 mTHP collapse, 'threshold_bits' = 1, while
> 'bits_set' = 1 (means only 1 chunk is present), so 'bits_set >
> threshold_bits' is false, then an order-3 mTHP collapse wouldn’t be
> attempted. No?
>
> So I have some concerns that if we only allow setting 0 or 511, it may
> not meet the goal we have for mTHP collapsing.
>
>>>> Because, as raised in the past, I'm afraid nobody on this earth has
>>>> a clue how
>>>> to set this parameter to values different to 0 (don't waste memory
>>>> with khugepaged)
>>>> and 511 (page fault behavior).
>>>
>>> Yup
>>>
>>>>
>>>>
>>>> If any other value is set, essentially
>>>> pr_warn("Unsupported 'max_ptes_none' value for mTHP collapse");
>>>>
>>>> for now and just disable it.
>>>
>>> Hmm but under what circumstances? I would just say unsupported value
>>> not mention
>>> mTHP or people who don't use mTHP might find that confusing.
>>
>> Well, we can check whether any mTHP size is enabled while the value
>> is set to something unexpected. We can then even print the
>> problematic sizes if we have to.
>>
>> We could also just just say that if the value is set to something
>> else than 511 (which is the default), it will be treated as being "0"
>> when collapsing mthp, instead of doing any scaling.
>>
>
Powered by blists - more mailing lists