lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8305ddf7-1ada-4a75-a2c3-385b530b25d4@redhat.com>
Date: Mon, 20 Jan 2025 14:56:07 +0100
From: David Hildenbrand <david@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>, Nico Pache <npache@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 anshuman.khandual@....com, catalin.marinas@....com, cl@...two.org,
 vbabka@...e.cz, mhocko@...e.com, apopple@...dia.com,
 dave.hansen@...ux.intel.com, will@...nel.org, baohua@...nel.org,
 jack@...e.cz, srivatsa@...il.mit.edu, haowenchao22@...il.com,
 hughd@...gle.com, aneesh.kumar@...nel.org, yang@...amperecomputing.com,
 peterx@...hat.com, ioworker0@...il.com, wangkefeng.wang@...wei.com,
 ziy@...dia.com, jglisse@...gle.com, surenb@...gle.com,
 vishal.moola@...il.com, zokeefe@...gle.com, zhengqi.arch@...edance.com,
 jhubbard@...dia.com, 21cnbao@...il.com, willy@...radead.org,
 kirill.shutemov@...ux.intel.com, aarcange@...hat.com, raquini@...hat.com,
 dev.jain@....com, sunnanyong@...wei.com, usamaarif642@...il.com,
 audra@...hat.com, akpm@...ux-foundation.org
Subject: Re: [RFC 00/11] khugepaged: mTHP support

On 20.01.25 14:37, Ryan Roberts wrote:
> On 20/01/2025 12:54, David Hildenbrand wrote:
>>>> I think the 1 problem that emerged during review of Dev's series, which we don't
>>>> have a proper solution to yet, is the issue of "creep", where regions can be
>>>> collapsed to progressively higher orders through iterative scans. At each
>>>> collapse, the required thresholds (e.g. max_ptes_none) are met, and the collapse
>>>> effectively adds more non-none ptes so the next scan will then collapse to even
>>>> higher order. Does your solution suffer from this (theoretical/edge case) issue?
>>>> If not, how did you solve?
>>>
>>> Yes sadly it suffers from the same issue. bringing max_ptes_none much
>>> lower as a default would "help".
>>
>> Can we just keep it simple and only support max_ptes_none = 511 ("pagefault
>> behavior" -- PMD_NR_PAGES - 1) or max_ptes_none = 0 ("deferred behavior") and
>> document that the other weird configurations will make mTHP skip, because "weird
>> and unexpetced" ? :)
>>
> 
> That sounds like a great simplification in principle!

And certainly a much easier to start with :)

If we ever get the request to support something else, maybe that's also 
where we can learn *why*, and what we would actually want to do with mTHP.

> We would need to consider
> the swap and shared tunables too though. Perhaps we can pull a similar trick
> with those?

Swapped and shared are a bit more challenging, because they are set to 
"/ 2" or "/ 8" heuristics.


One simple starting point here is of course to say "when collapsing 
mTHP, all have to be unshared and all have to be swapped in", so to 
essentially ignore both tunables (in a memory friendly way, as if they 
are set to 0) for mTHP collapse and worry about that later, when really 
required.

Two alternatives I discussed with Nico for these (not sure which is 
implemented here) is to calculate it proportionally to the folio order 
we are collapsing:

Assuming max_ptes_swap = 64 (PMD: 512 PTEs) and we are collapsing a 1 
MiB mTHP (256 PTEs), 32 PTEs would be allowed to be swapped out.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ