[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a027fe94-e6c2-46d0-8768-6acd8e801cc3@redhat.com>
Date: Wed, 25 Jun 2025 10:40:23 +0200
From: David Hildenbrand <david@...hat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Hugh Dickins <hughd@...gle.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>, akpm@...ux-foundation.org,
ziy@...dia.com, Liam.Howlett@...cle.com, npache@...hat.com,
ryan.roberts@....com, dev.jain@....com, baohua@...nel.org,
zokeefe@...gle.com, shy828301@...il.com, usamaarif642@...il.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 0/2] fix MADV_COLLAPSE issue if THP settings are
disabled
On 25.06.25 10:22, Lorenzo Stoakes wrote:
> On Wed, Jun 25, 2025 at 10:16:46AM +0200, David Hildenbrand wrote:
>> On 25.06.25 09:49, David Hildenbrand wrote:
>>> I think the whole use case of using MADV_COLLAPSE to completely control
>>> THP allocation in a system is otherwise pretty hard to achieve, if there
>>> is no other way to tame THP allocation through page faults+khugepaged.
>>
>> Just want to add: for an app itself, it's doable in "madvise" mode perfectly
>> fine.
>>
>> If your app does a MADV_HUGEPAGE, it can get a THP during page-fault +
>> khugepaged.
>>
>> If your app does not do a MADV_HUGEPAGE, it can get a THP through
>> MADV_COLLAPSE.
>>
>> So the "madvise" mode actually works.
>
> Right, but for me MADV_COLLAPSE is more about 'I want THPs _now_ (if available),
> not when khugepaged decides to give me some'.
>
> So we have multiple semantics at work here, unfortunately.
>
>>
>> The problem appears as soon as we want to control other processes that might
>> be setting MADV_HUGEPAGE, and we actually want to control the behavior using
>> process_madvise(MADV_COLLAPSE), to say "well, the MADV_HUGEPAGE" should be
>> ignored.
>
> This is a _very_ specialist use.
>
> I'd argue for a 'manual' mode to be added to sysfs to cover this case, with
> 'never' having the 'actually means never' semantics.
>
> You might argue that could confuse things, but it'd retain the 'de facto'
> understanding nearly everybody has about what thees flags mean, but give
> whatever user is out there that needs this the ability to continue doing what
> they want.
>
> And we get into philosophy about not 'breaking' userland, not sure we have a
> TLB/page fault/folio allocation efficiency contract with userland :)
>
> No program will break with this patch applied. Just potentially get performance
> degradation in a very, very specialist case.
>
>>
>> Then, you configure "never" system-wide and use
>> process_madvise(MADV_COLLAPSE) to drive it all manually.
>>
>> Curious to learn if there is such a user out there.
>
> Oh me too :)
I just looked at the original use cases [1], such a use case is not
mentioned.
But it did add process_madvise(MADV_COLLAPSE) in
876b4a1896646cc85ec6b1fc1c9270928b7e0831 where we document
"
This is useful for the development of userspace agents that seek to
optimize THP utilization system-wide by using userspace signals to
prioritize what memory is most deserving of being THP-backed.
"
The "prioritize" might indicate that this is used in combination with
"madvise", not with "never"/
So yeah, it all boils down to
(1) If there is no such use case, "never can mean never". Because there
is nothing to break, really.
(2) If there is such a use case, we might be breaking it.
[1]
https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists