[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ad93f480-431a-4f9b-9225-136d8c6c37df@linux.alibaba.com>
Date: Tue, 29 Apr 2025 15:16:08 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: Nico Pache <npache@...hat.com>
Cc: linux-mm@...ck.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
akpm@...ux-foundation.org, corbet@....net, rostedt@...dmis.org,
mhiramat@...nel.org, mathieu.desnoyers@...icios.com, david@...hat.com,
baohua@...nel.org, ryan.roberts@....com, willy@...radead.org,
peterx@...hat.com, ziy@...dia.com, wangkefeng.wang@...wei.com,
usamaarif642@...il.com, sunnanyong@...wei.com, vishal.moola@...il.com,
thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com,
kirill.shutemov@...ux.intel.com, aarcange@...hat.com, raquini@...hat.com,
dev.jain@....com, anshuman.khandual@....com, catalin.marinas@....com,
tiwai@...e.de, will@...nel.org, dave.hansen@...ux.intel.com, jack@...e.cz,
cl@...two.org, jglisse@...gle.com, surenb@...gle.com, zokeefe@...gle.com,
hannes@...xchg.org, rientjes@...gle.com, mhocko@...e.com,
rdunlap@...radead.org
Subject: Re: [PATCH v4 06/12] khugepaged: introduce khugepaged_scan_bitmap for
mTHP support
On 2025/4/28 22:47, Nico Pache wrote:
> On Sat, Apr 26, 2025 at 8:52 PM Baolin Wang
> <baolin.wang@...ux.alibaba.com> wrote:
>>
>>
>>
>> On 2025/4/17 08:02, Nico Pache wrote:
>>> khugepaged scans PMD ranges for potential collapse to a hugepage. To add
>>> mTHP support we use this scan to instead record chunks of utilized
>>> sections of the PMD.
>>>
>>> khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap
>>> that represents chunks of utilized regions. We can then determine what
>>> mTHP size fits best and in the following patch, we set this bitmap while
>>> scanning the PMD.
>>>
>>> max_ptes_none is used as a scale to determine how "full" an order must
>>> be before being considered for collapse.
>>>
>>> When attempting to collapse an order that has its order set to "always"
>>> lets always collapse to that order in a greedy manner without
>>> considering the number of bits set.
>>>
>>> Signed-off-by: Nico Pache <npache@...hat.com>
>>> ---
>>> include/linux/khugepaged.h | 4 ++
>>> mm/khugepaged.c | 94 ++++++++++++++++++++++++++++++++++----
>>> 2 files changed, 89 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
>>> index 1f46046080f5..18fe6eb5051d 100644
>>> --- a/include/linux/khugepaged.h
>>> +++ b/include/linux/khugepaged.h
>>> @@ -1,6 +1,10 @@
>>> /* SPDX-License-Identifier: GPL-2.0 */
>>> #ifndef _LINUX_KHUGEPAGED_H
>>> #define _LINUX_KHUGEPAGED_H
>>> +#define KHUGEPAGED_MIN_MTHP_ORDER 2
>>
>> Why is the minimum mTHP order set to 2? IMO, the file large folios can
>> support order 1, so we don't expect to collapse exec file small folios
>> to order 1 if possible?
> I should have been more specific in the patch notes, but this affects
> anonymous only. I'll go over my commit messages and make sure this is
> reflected in the next version.
OK. I am looking into how to support shmem mTHP collapse based on your
patch series.
>> (PS: I need more time to understand your logic in this patch, and any
>> additional explanation would be helpful:) )
>
> We are currently scanning ptes in a PMD. The core principle/reasoning
> behind the bitmap is to keep the PMD scan while saving its state. We
> then use this bitmap to determine which chunks of the PMD are active
> and are the best candidates for mTHP collapse. We start at the PMD
> level, and recursively break down the bitmap to find the appropriate
> sizes for the bitmap.
>
> looking at a simplified example: we scan a PMD and get the following
> bitmap, 1111101101101011 (in this case MIN_MTHP_ORDER= 5, so each bit
> == 32 ptes, in the actual set each bit == 4 ptes).
> We would first attempt a PMD collapse, while checking the number of
> bits set vs the max_ptes_none tunable. If those conditions arent
> triggered, we will try the next enabled mTHP order, for each half of
> the bitmap.
>
> ie) order 8 attempt on 11111011 and order 8 attempt on 01101011.
>
> If a collapse succeeds we dont keep recursing on that portion of the
> bitmap. If not, we continue attempting lower orders.
>
> Hopefully that helps you understand my logic here! Let me know if you
> need more clarification.
Thanks for your explanation. That's pretty much how I understand it.:)
I'll give a test for your new version.
>
> I gave a presentation on this that might help too:
> https://docs.google.com/presentation/d/1w9NYLuC2kRcMAwhcashU1LWTfmI5TIZRaTWuZq-CHEg/edit?usp=sharing&resourcekey=0-nBAGld8cP1kW26XE6i0Bpg
Unfortunately, this link requires access permission.
Powered by blists - more mailing lists