[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250520222609.GD773385@cmpxchg.org>
Date: Tue, 20 May 2025 18:26:09 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
David Hildenbrand <david@...hat.com>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Arnd Bergmann <arnd@...db.de>,
Christian Brauner <brauner@...nel.org>, linux-mm@...ck.org,
linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
SeongJae Park <sj@...nel.org>, Usama Arif <usamaarif642@...il.com>
Subject: Re: [RFC PATCH 4/5] mm/madvise: add PMADV_SET_FORK_EXEC_DEFAULT
process_madvise() flag
On Mon, May 19, 2025 at 09:52:41PM +0100, Lorenzo Stoakes wrote:
> We intentionally limit this only to flags that we know should function
> correctly without issue, and to be conservative about this, so we initially
> limit ourselves only to MADV_HUGEPAGE, MADV_NOHUGEPAGE, that is - setting
> the VM_HUGEPAGE, VM_NOHUGEPAGE VMA flags.
Hm, but do we actually expect more to show up? Looking at the current
list of advisories, we have the conventional ones:
MADV_NORMAL
No special treatment. This is the default.
MADV_RANDOM
Expect page references in random order. (Hence, read ahead may be less useful than normally.)
MADV_SEQUENTIAL
Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)
MADV_WILLNEED
Expect access in the near future. (Hence, it might be a good idea to read some pages ahead.)
MADV_DONTNEED
Do not expect access in the near future. (For the time being, the application is finished with the given range, so the kernel can free resources associated with it.)
where only RANDOM and SEQUENTIAL are actual policies. But since those
apply to file mappings only, they don't seem to make much sense for a
process-wide setting.
For Linux-specific advisories we have
MADV_REMOVE - action
MADV_DONTFORK - policy, but not sure how this could work as a process-wide thing
MADV_HWPOISON - action
MADV_MERGEABLE - policy, but we have a prctl() for process-wide settings
MADV_SOFTOFFLINE - action
MADV_HUGEPAGE - policy, but we have a prctl() for process-wide disabling
MADV_COLLAPSE - action
MADV_DONTDUMP - policy, but there is /proc/<pid>/coredump_filter for process-wide settings
MADV_FREE - action
MADV_WIPEONFORK - policy, but similar to DONTFORK, not sure how this would work process-wide
MADV_COLD - action
MADV_PAGEOUT - action
MADV_POPULATE_READ - action
MADV_POPULATE_WRITE - action
MADV_GUARD_INSTALL - action
So the vast majority of advisories are either one-off actions, or they
are policies that naturally only make sense for very specific ranges.
KSM and THP really seem like the notable exception[1], rather than a
rule. And we already *have* prctl() ops to modify per-process policies
for these. So I'm reluctant to agree we should drill open and expand
madvise() to cover them - especially with the goal or the expectation
that THP per-process policies shouldn't matter that much down the line.
I will admit, I don't hate prctl() as much as you do. It *is* a fairly
broad interface - setting per-process policies. So it's bound to pick
odds and ends of various subsystems that don't quite fit elsewhere.
Is it the right choice everywhere? Of course not. And can its
broadness be abused to avoid real interface design? Absolutely.
I don't think that makes it inherently bad, though. As long as we make
an honest effort to find the best home for new knobs.
But IMO, in this case it is. The inheritance-along-the-process-tree
behavior that we want here is an established pattern in prctl().
And *because* that propagation is a security-sensitive pattern
(although I don't think the THP policy specifically *is* a security
issue), to me it makes more sense to keep this behavior confined to
prctl() at least, and not add it to madvise too.
[1] Maybe we should have added sys_andrea() to cover those, as well as
the SECCOMP prctl()s ;)
Powered by blists - more mailing lists