[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f77366d-e100-46bb-ac85-aa4b216eb2cf@redhat.com>
Date: Thu, 15 May 2025 21:12:13 +0200
From: David Hildenbrand <david@...hat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Usama Arif <usamaarif642@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
hannes@...xchg.org, shakeel.butt@...ux.dev, riel@...riel.com,
ziy@...dia.com, laoar.shao@...il.com, baolin.wang@...ux.alibaba.com,
Liam.Howlett@...cle.com, npache@...hat.com, ryan.roberts@....com,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH 1/6] prctl: introduce PR_THP_POLICY_DEFAULT_HUGE for the
process
On 15.05.25 20:08, Lorenzo Stoakes wrote:
> On Thu, May 15, 2025 at 06:11:55PM +0200, David Hildenbrand wrote:
>>>>> So if you're not overriding VM_NOHUGEPAGE, the whole point of this exercise
>>>>> is to override global 'never'?
>>>>>
>>>>
>>>> Again, I am not overriding never.
>>>>
>>>> hugepage_global_always and hugepage_global_enabled will evaluate to false
>>>> and you will not get a hugepage.
>>>
>>> Yeah, again ack, but I kind of hate that we set VM_HUGEPAGE everywhere even
>>> if the policy is never.
>>
>> I think it should behave just as if someone does manually an madvise(). So
>> whatever we do here during an madvise, we should try to do the same thing
>> here.
>
> Ack I agree with this.
>
> It actually simplifies things a LOT to view it this way - we're saying 'by
> default apply madvise(...) to new VMAs'.
>
> Hm I wonder if we could have a more generic version of this...
>
> Note though that we're not _quite_ doing this.
>
> So in hugepage_madvise():
>
> int hugepage_madvise(struct vm_area_struct *vma,
> unsigned long *vm_flags, int advice)
> {
> ...
>
> switch (advice) {
> case MADV_HUGEPAGE:
> *vm_flags &= ~VM_NOHUGEPAGE;
> *vm_flags |= VM_HUGEPAGE;
>
> ...
>
> break;
>
> ...
> }
>
> ...
> }
>
> So here we're actually clearing VM_NOHUGEPAGE and overriding it, but in the
> proposed code we're not.
Yeah, I think I suggested that, but probably we should just do exactly
what madvise() does.
>
> So we're back into confusing territory again :)
>
> I wonder if we could...
>
> 1. Add an MADV_xxx that mimics the desired behaviour here.
>
> 2. Add a generic 'madvise() by default' thing at a process level?
>
> Is this crazy?
I think that's what I had in mind, just a bit twisted.
What could work is
1) prctl to set the default
2) madvise() to adjust all existing VMAs
We might have to teach 2) to ignore non-compatible VMAs / holes. Maybe
not, worth an investigation.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists