[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9433c2d6-200c-4320-80f3-840ca5e66f64@redhat.com>
Date: Thu, 22 May 2025 15:05:30 +0200
From: David Hildenbrand <david@...hat.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Arnd Bergmann <arnd@...db.de>, Christian Brauner <brauner@...nel.org>,
linux-mm@...ck.org, linux-arch@...r.kernel.org,
linux-kernel@...r.kernel.org, SeongJae Park <sj@...nel.org>,
Usama Arif <usamaarif642@...il.com>
Subject: Re: [RFC PATCH 0/5] add process_madvise() flags to modify behaviour
On 21.05.25 19:39, Shakeel Butt wrote:
> On Wed, May 21, 2025 at 05:49:15PM +0100, Lorenzo Stoakes wrote:
> [...]
>>>
>>> Please let's first get consensus on this before starting the work.
>>
>> With respect Shakeel, I'll work on whatever I want, whenever I want.
>
> I fail to understand why you would respond like that.
Relax guys ... :) Really nothing to be fighting about.
Lorenzo has a lot of energy to play with things, to see how it would
look. I wish I would have that much energy, but I have no idea where it
went ... (well, okay, I have a suspicion) :P
At the same time, I hope (and assume :) ) that Lorenzo will get Usama
involved in the development once we know what we want.
To summarize my current view:
1) ebpf: most people are are not a fan of that, and I agree, at least
for this purpose. If we were talking about making better *placement*
decisions using epbf, it would be a different story.
2) prctl(): the unloved child, and I can understand why. Maybe now is
the right time to stop adding new MM things that feel weird in there.
Maybe we should already have done that with the KSM toggle (guess who
was involved in that ;) ).
3) process_madvise(): I think it's an interesting extension, but
probably we should just have something that applies to the whole
address space naturally. At least my take for now.
4) new syscall: worth exploring how it would look. I'm especially
interested in flag options (e.g., SET_DEFAULT_EXEC) and how we could
make them only apply to selected controls.
An API prototype of 4), not necessarily with the code yet, might be
valuable.
In general, the "always/madvise/never" policies are really horrible. We
should instead be prioritizing who gets THPs -- and only disable them
for selected workloads.
Because splitting THPs up because a process is not allowed to use them,
thereby increasing memory fragmentation, is really absolutely suboptimal.
But we don't have anything better right now.
So I would hope that we can at least turn the "always/VM_HUGEPAGE" into
a "prioritize for largest (m)THPs possible" in a distant future.
If only changing the semantics of VM_NOHUGEPAGE to mean "deprioritize
for THPs" couldn't break userfaultfd ... :( But maybe that can be worked
around in the future somehow (e.g., when we detect userfaultfd usage,
not sure, ...).
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists