linux-kernel - Re: [RFC PATCH 4/5] mm/madvise: add PMADV_SET_FORK_EXEC_DEFAULT process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <e48b0293-5ccd-4205-910c-302b61b12b55@lucifer.local>
Date: Thu, 29 May 2025 15:46:28 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        David Hildenbrand <david@...hat.com>, Vlastimil Babka <vbabka@...e.cz>,
        Jann Horn <jannh@...gle.com>, Arnd Bergmann <arnd@...db.de>,
        Christian Brauner <brauner@...nel.org>, linux-mm@...ck.org,
        linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
        SeongJae Park <sj@...nel.org>, Usama Arif <usamaarif642@...il.com>
Subject: Re: [RFC PATCH 4/5] mm/madvise: add PMADV_SET_FORK_EXEC_DEFAULT
 process_madvise() flag

On Tue, May 20, 2025 at 06:26:09PM -0400, Johannes Weiner wrote:
> On Mon, May 19, 2025 at 09:52:41PM +0100, Lorenzo Stoakes wrote:
> > We intentionally limit this only to flags that we know should function
> > correctly without issue, and to be conservative about this, so we initially
> > limit ourselves only to MADV_HUGEPAGE, MADV_NOHUGEPAGE, that is - setting
> > the VM_HUGEPAGE, VM_NOHUGEPAGE VMA flags.
>
> Hm, but do we actually expect more to show up? Looking at the current
> list of advisories, we have the conventional ones:
>
>        MADV_NORMAL
>               No special treatment.  This is the default.
>
>        MADV_RANDOM
>               Expect page references in random order.  (Hence, read ahead may be less useful than normally.)
>
>        MADV_SEQUENTIAL
>               Expect page references in sequential order.  (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)
>
>        MADV_WILLNEED
>               Expect access in the near future.  (Hence, it might be a good idea to read some pages ahead.)
>
>        MADV_DONTNEED
>               Do not expect access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources associated with it.)
>
> where only RANDOM and SEQUENTIAL are actual policies. But since those
> apply to file mappings only, they don't seem to make much sense for a
> process-wide setting.
>
> For Linux-specific advisories we have
>
> 	MADV_REMOVE		- action
> 	MADV_DONTFORK		- policy, but not sure how this could work as a process-wide thing
> 	MADV_HWPOISON		- action
> 	MADV_MERGEABLE		- policy, but we have a prctl() for process-wide settings
> 	MADV_SOFTOFFLINE	- action
> 	MADV_HUGEPAGE		- policy, but we have a prctl() for process-wide disabling
> 	MADV_COLLAPSE		- action
> 	MADV_DONTDUMP		- policy, but there is /proc/<pid>/coredump_filter for process-wide settings
> 	MADV_FREE		- action
> 	MADV_WIPEONFORK		- policy, but similar to DONTFORK, not sure how this would work process-wide
> 	MADV_COLD		- action
> 	MADV_PAGEOUT		- action
> 	MADV_POPULATE_READ	- action
> 	MADV_POPULATE_WRITE	- action
> 	MADV_GUARD_INSTALL	- action
>
> So the vast majority of advisories are either one-off actions, or they
> are policies that naturally only make sense for very specific ranges.
>
> KSM and THP really seem like the notable exception[1], rather than a
> rule. And we already *have* prctl() ops to modify per-process policies
> for these. So I'm reluctant to agree we should drill open and expand
> madvise() to cover them - especially with the goal or the expectation
> that THP per-process policies shouldn't matter that much down the line.
>
> I will admit, I don't hate prctl() as much as you do. It *is* a fairly
> broad interface - setting per-process policies. So it's bound to pick
> odds and ends of various subsystems that don't quite fit elsewhere.
>
> Is it the right choice everywhere? Of course not. And can its
> broadness be abused to avoid real interface design? Absolutely.
>
> I don't think that makes it inherently bad, though. As long as we make
> an honest effort to find the best home for new knobs.
>
> But IMO, in this case it is. The inheritance-along-the-process-tree
> behavior that we want here is an established pattern in prctl().
>
> And *because* that propagation is a security-sensitive pattern
> (although I don't think the THP policy specifically *is* a security
> issue), to me it makes more sense to keep this behavior confined to
> prctl() at least, and not add it to madvise too.
>
> [1] Maybe we should have added sys_andrea() to cover those, as well as
>     the SECCOMP prctl()s ;)

So sorry to have missed you really excellent and well thought-out reply
here Johannes (grr mail etc.).

You make a really good point, and I think this does overall, sadly, suggest
the concept of a 'general' madvise() call while, in a sense, 'neat',
doesn't quite fit.

I do think it aligns well with a dedicated mctl() API call, have sent out
an RFC discussion thread about this so we can determine if this makes
sense.

Thanks, Lorenzo