[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e48b0293-5ccd-4205-910c-302b61b12b55@lucifer.local>
Date: Thu, 29 May 2025 15:46:28 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
David Hildenbrand <david@...hat.com>, Vlastimil Babka <vbabka@...e.cz>,
Jann Horn <jannh@...gle.com>, Arnd Bergmann <arnd@...db.de>,
Christian Brauner <brauner@...nel.org>, linux-mm@...ck.org,
linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
SeongJae Park <sj@...nel.org>, Usama Arif <usamaarif642@...il.com>
Subject: Re: [RFC PATCH 4/5] mm/madvise: add PMADV_SET_FORK_EXEC_DEFAULT
process_madvise() flag
On Tue, May 20, 2025 at 06:26:09PM -0400, Johannes Weiner wrote:
> On Mon, May 19, 2025 at 09:52:41PM +0100, Lorenzo Stoakes wrote:
> > We intentionally limit this only to flags that we know should function
> > correctly without issue, and to be conservative about this, so we initially
> > limit ourselves only to MADV_HUGEPAGE, MADV_NOHUGEPAGE, that is - setting
> > the VM_HUGEPAGE, VM_NOHUGEPAGE VMA flags.
>
> Hm, but do we actually expect more to show up? Looking at the current
> list of advisories, we have the conventional ones:
>
> MADV_NORMAL
> No special treatment. This is the default.
>
> MADV_RANDOM
> Expect page references in random order. (Hence, read ahead may be less useful than normally.)
>
> MADV_SEQUENTIAL
> Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)
>
> MADV_WILLNEED
> Expect access in the near future. (Hence, it might be a good idea to read some pages ahead.)
>
> MADV_DONTNEED
> Do not expect access in the near future. (For the time being, the application is finished with the given range, so the kernel can free resources associated with it.)
>
> where only RANDOM and SEQUENTIAL are actual policies. But since those
> apply to file mappings only, they don't seem to make much sense for a
> process-wide setting.
>
> For Linux-specific advisories we have
>
> MADV_REMOVE - action
> MADV_DONTFORK - policy, but not sure how this could work as a process-wide thing
> MADV_HWPOISON - action
> MADV_MERGEABLE - policy, but we have a prctl() for process-wide settings
> MADV_SOFTOFFLINE - action
> MADV_HUGEPAGE - policy, but we have a prctl() for process-wide disabling
> MADV_COLLAPSE - action
> MADV_DONTDUMP - policy, but there is /proc/<pid>/coredump_filter for process-wide settings
> MADV_FREE - action
> MADV_WIPEONFORK - policy, but similar to DONTFORK, not sure how this would work process-wide
> MADV_COLD - action
> MADV_PAGEOUT - action
> MADV_POPULATE_READ - action
> MADV_POPULATE_WRITE - action
> MADV_GUARD_INSTALL - action
>
> So the vast majority of advisories are either one-off actions, or they
> are policies that naturally only make sense for very specific ranges.
>
> KSM and THP really seem like the notable exception[1], rather than a
> rule. And we already *have* prctl() ops to modify per-process policies
> for these. So I'm reluctant to agree we should drill open and expand
> madvise() to cover them - especially with the goal or the expectation
> that THP per-process policies shouldn't matter that much down the line.
>
> I will admit, I don't hate prctl() as much as you do. It *is* a fairly
> broad interface - setting per-process policies. So it's bound to pick
> odds and ends of various subsystems that don't quite fit elsewhere.
>
> Is it the right choice everywhere? Of course not. And can its
> broadness be abused to avoid real interface design? Absolutely.
>
> I don't think that makes it inherently bad, though. As long as we make
> an honest effort to find the best home for new knobs.
>
> But IMO, in this case it is. The inheritance-along-the-process-tree
> behavior that we want here is an established pattern in prctl().
>
> And *because* that propagation is a security-sensitive pattern
> (although I don't think the THP policy specifically *is* a security
> issue), to me it makes more sense to keep this behavior confined to
> prctl() at least, and not add it to madvise too.
>
> [1] Maybe we should have added sys_andrea() to cover those, as well as
> the SECCOMP prctl()s ;)
So sorry to have missed you really excellent and well thought-out reply
here Johannes (grr mail etc.).
You make a really good point, and I think this does overall, sadly, suggest
the concept of a 'general' madvise() call while, in a sense, 'neat',
doesn't quite fit.
I do think it aligns well with a dedicated mctl() API call, have sent out
an RFC discussion thread about this so we can determine if this makes
sense.
Thanks, Lorenzo
Powered by blists - more mailing lists