lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f36ea21e-285e-458d-b3a1-e729825b6d89@arm.com>
Date: Fri, 22 Aug 2025 21:03:41 +0530
From: Dev Jain <dev.jain@....com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 David Hildenbrand <david@...hat.com>
Cc: Nico Pache <npache@...hat.com>, linux-mm@...ck.org,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org, ziy@...dia.com,
 baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com,
 ryan.roberts@....com, corbet@....net, rostedt@...dmis.org,
 mhiramat@...nel.org, mathieu.desnoyers@...icios.com,
 akpm@...ux-foundation.org, baohua@...nel.org, willy@...radead.org,
 peterx@...hat.com, wangkefeng.wang@...wei.com, usamaarif642@...il.com,
 sunnanyong@...wei.com, vishal.moola@...il.com,
 thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com,
 kirill.shutemov@...ux.intel.com, aarcange@...hat.com, raquini@...hat.com,
 anshuman.khandual@....com, catalin.marinas@....com, tiwai@...e.de,
 will@...nel.org, dave.hansen@...ux.intel.com, jack@...e.cz, cl@...two.org,
 jglisse@...gle.com, surenb@...gle.com, zokeefe@...gle.com,
 hannes@...xchg.org, rientjes@...gle.com, mhocko@...e.com,
 rdunlap@...radead.org, hughd@...gle.com
Subject: Re: [PATCH v10 00/13] khugepaged: mTHP support


On 22/08/25 8:19 pm, Lorenzo Stoakes wrote:
> On Fri, Aug 22, 2025 at 04:10:35PM +0200, David Hildenbrand wrote:
>>>> Once could also easily support the value 255 (HPAGE_PMD_NR / 2- 1), but not sure
>>>> if we have to add that for now.
>>> Yeah not so sure about this, this is a 'just have to know' too, and yes you
>>> might add it to the docs, but people are going to be mightily confused, esp if
>>> it's a calculated value.
>>>
>>> I don't see any other way around having a separate tunable if we don't just have
>>> something VERY simple like on/off.
>> Yeah, not advocating that we add support for other values than 0/511,
>> really.
> Yeah I'm fine with 0/511.
>
>>> Also the mentioned issue sounds like something that needs to be fixed elsewhere
>>> honestly in the algorithm used to figure out mTHP ranges (I may be wrong - and
>>> happy to stand corrected if this is somehow inherent, but reallly feels that
>>> way).
>> I think the creep is unavoidable for certain values.
>>
>> If you have the first two pages of a PMD area populated, and you allow for
>> at least half of the #PTEs to be non/zero, you'd collapse first a
>> order-2 folio, then and order-3 ... until you reached PMD order.
> Feels like we should be looking at this in reverse? What's the largest, then
> next largest, then etc.?
>
> Surely this is the sensible way of doing it?

What David means to say is, for example, suppose all orders are enabled,
and we fail to collapse for order-9, then order-8, then order-7, and so on,
*only* because the distribution of ptes did not obey the scaled max_ptes_none.
Let order-4 collapse succeed.

Next time, khugepaged comes and tries for order-9, fails, then order-8, fails and
so on. Then it checks for order-5, and it comes under the scaled max_ptes_none constraint
only because the previous cycle's order-4 collapse changed the ptes' distribution.
  

>
>> So for now we really should just support 0 / 511 to say "don't collapse if
>> there are holes" vs. "always collapse if there is at least one pte used".
> Yes.
>
>>>> Because, as raised in the past, I'm afraid nobody on this earth has a clue how
>>>> to set this parameter to values different to 0 (don't waste memory with khugepaged)
>>>> and 511 (page fault behavior).
>>> Yup
>>>
>>>>
>>>> If any other value is set, essentially
>>>> 	pr_warn("Unsupported 'max_ptes_none' value for mTHP collapse");
>>>>
>>>> for now and just disable it.
>>> Hmm but under what circumstances? I would just say unsupported value not mention
>>> mTHP or people who don't use mTHP might find that confusing.
>> Well, we can check whether any mTHP size is enabled while the value is set
>> to something unexpected. We can then even print the problematic sizes if we
>> have to.
> Ack
>
>> We could also just just say that if the value is set to something else than
>> 511 (which is the default), it will be treated as being "0" when collapsing
>> mthp, instead of doing any scaling.
> Or we could make it an error to set anything but 0, 511, but on the other hand
> that's likely to break userspace so yeah probably not.
>
> Maybe have a warning saying 'this is no longer supported and will be ignored'
> then set the value to 0 for anything but 511 or 0.
>
> Then can remove the warning later.
>
> By having 0/511 we can really simplify the 'scaling' logic too which would be
> fantastic! :)

FWIW here was my implementation of this thing, for ease of everyone:
https://lore.kernel.org/all/20250211111326.14295-17-dev.jain@arm.com/

>
> Cheers, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ