lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <bf03c2e2-66fc-4745-952a-de3fbf65c4ab@redhat.com>
Date: Mon, 1 Sep 2025 19:06:21 +0200
From: David Hildenbrand <david@...hat.com>
To: Nico Pache <npache@...hat.com>, linux-mm@...ck.org,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org
Cc: ziy@...dia.com, baolin.wang@...ux.alibaba.com,
 lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, ryan.roberts@....com,
 dev.jain@....com, corbet@....net, rostedt@...dmis.org, mhiramat@...nel.org,
 mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org,
 baohua@...nel.org, willy@...radead.org, peterx@...hat.com,
 wangkefeng.wang@...wei.com, usamaarif642@...il.com, sunnanyong@...wei.com,
 vishal.moola@...il.com, thomas.hellstrom@...ux.intel.com,
 yang@...amperecomputing.com, kirill.shutemov@...ux.intel.com,
 aarcange@...hat.com, raquini@...hat.com, anshuman.khandual@....com,
 catalin.marinas@....com, tiwai@...e.de, will@...nel.org,
 dave.hansen@...ux.intel.com, jack@...e.cz, cl@...two.org,
 jglisse@...gle.com, surenb@...gle.com, zokeefe@...gle.com,
 hannes@...xchg.org, rientjes@...gle.com, mhocko@...e.com,
 rdunlap@...radead.org, hughd@...gle.com
Subject: Re: [PATCH v10 00/13] khugepaged: mTHP support

On 01.09.25 18:21, David Hildenbrand wrote:
> On 19.08.25 15:41, Nico Pache wrote:
>> The following series provides khugepaged with the capability to collapse
>> anonymous memory regions to mTHPs.
>>
>> To achieve this we generalize the khugepaged functions to no longer depend
>> on PMD_ORDER. Then during the PMD scan, we use a bitmap to track chunks of
>> pages (defined by KHUGEPAGED_MTHP_MIN_ORDER) that are utilized. After the
>> PMD scan is done, we do binary recursion on the bitmap to find the optimal
>> mTHP sizes for the PMD range. The restriction on max_ptes_none is removed
>> during the scan, to make sure we account for the whole PMD range. When no
>> mTHP size is enabled, the legacy behavior of khugepaged is maintained.
>> max_ptes_none will be scaled by the attempted collapse order to determine
>> how full a mTHP must be to be eligible for the collapse to occur. If a
>> mTHP collapse is attempted, but contains swapped out, or shared pages, we
>> don't perform the collapse. It is now also possible to collapse to mTHPs
>> without requiring the PMD THP size to be enabled.
>>
>> With the default max_ptes_none=511, the code should keep its most of its
>> original behavior. When enabling multiple adjacent (m)THP sizes we need to
>> set max_ptes_none<=255. With max_ptes_none > HPAGE_PMD_NR/2 you will
>> experience collapse "creep" and constantly promote mTHPs to the next
>> available size. This is due the fact that a collapse will introduce at
>> least 2x the number of pages, and on a future scan will satisfy the
>> promotion condition once again.
>>
>> Patch 1:     Refactor/rename hpage_collapse
>> Patch 2:     Some refactoring to combine madvise_collapse and khugepaged
>> Patch 3-5:   Generalize khugepaged functions for arbitrary orders
>> Patch 6-8:   The mTHP patches
>> Patch 9-10:  Allow khugepaged to operate without PMD enabled
>> Patch 11-12: Tracing/stats
>> Patch 13:    Documentation
> 
> Would it be feasible to start with simply not supporting the
> max_pte_none parameter in the first version, just like we won't support
> max_pte_swapped/max_pte_shared in the first version?
> 
> That gives us more time to think about how to use/modify the old interface.
> 
> For example, I could envision a ratio-based interface, or as discussed
> with Lorenzo a simple boolean. We could make the existing max_ptes*
> interface backwards compatible then.
> 
> That also gives us the opportunity to think about the creep problem
> separately.
> 
> I'm sure initial mTHP collapse will be valuable even without support for
> that weird set of parameters.
> 
> Would there be implementation-wise a problem?
> 
> But let me think further about the creep problem ... :/

FWIW, I just looked around and there is documented usage of setting 
max_ptes_none to 0 [1, 2, 3].

In essence, I think it can make sense to set it to 0 when an application 
wants to manage THP on its own (MADV_COLLAPSE), and avoid khugepaged 
interfering. Now, using a system-wide toggle for such a use case is 
rather questionable, but it's all we have.

I did not find anything only recommending to set values different to 0 
or 511 -- so far.

So *likely* focusing on 0 vs. 511 initially would cover most use cases 
out there. Ignoring the parameter initially (require all to be !none) 
could of course also work.

[1] https://www.mongodb.com/docs/manual/administration/tcmalloc-performance/
[2] https://google.github.io/tcmalloc/tuning.html
[3] 
https://support.yugabyte.com/hc/en-us/articles/36558155921165-Mitigating-Excessive-RSS-Memory-Usage-Due-to-THP-Transparent-Huge-Pages

-- 
Cheers

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ