lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250520222609.GD773385@cmpxchg.org>
Date: Tue, 20 May 2025 18:26:09 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	"Liam R . Howlett" <Liam.Howlett@...cle.com>,
	David Hildenbrand <david@...hat.com>,
	Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
	Arnd Bergmann <arnd@...db.de>,
	Christian Brauner <brauner@...nel.org>, linux-mm@...ck.org,
	linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
	SeongJae Park <sj@...nel.org>, Usama Arif <usamaarif642@...il.com>
Subject: Re: [RFC PATCH 4/5] mm/madvise: add PMADV_SET_FORK_EXEC_DEFAULT
 process_madvise() flag

On Mon, May 19, 2025 at 09:52:41PM +0100, Lorenzo Stoakes wrote:
> We intentionally limit this only to flags that we know should function
> correctly without issue, and to be conservative about this, so we initially
> limit ourselves only to MADV_HUGEPAGE, MADV_NOHUGEPAGE, that is - setting
> the VM_HUGEPAGE, VM_NOHUGEPAGE VMA flags.

Hm, but do we actually expect more to show up? Looking at the current
list of advisories, we have the conventional ones:

       MADV_NORMAL
              No special treatment.  This is the default.

       MADV_RANDOM
              Expect page references in random order.  (Hence, read ahead may be less useful than normally.)

       MADV_SEQUENTIAL
              Expect page references in sequential order.  (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)

       MADV_WILLNEED
              Expect access in the near future.  (Hence, it might be a good idea to read some pages ahead.)

       MADV_DONTNEED
              Do not expect access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources associated with it.)

where only RANDOM and SEQUENTIAL are actual policies. But since those
apply to file mappings only, they don't seem to make much sense for a
process-wide setting.

For Linux-specific advisories we have

	MADV_REMOVE		- action
	MADV_DONTFORK		- policy, but not sure how this could work as a process-wide thing
	MADV_HWPOISON		- action
	MADV_MERGEABLE		- policy, but we have a prctl() for process-wide settings
	MADV_SOFTOFFLINE	- action
	MADV_HUGEPAGE		- policy, but we have a prctl() for process-wide disabling
	MADV_COLLAPSE		- action
	MADV_DONTDUMP		- policy, but there is /proc/<pid>/coredump_filter for process-wide settings
	MADV_FREE		- action
	MADV_WIPEONFORK		- policy, but similar to DONTFORK, not sure how this would work process-wide
	MADV_COLD		- action
	MADV_PAGEOUT		- action
	MADV_POPULATE_READ	- action
	MADV_POPULATE_WRITE	- action
	MADV_GUARD_INSTALL	- action

So the vast majority of advisories are either one-off actions, or they
are policies that naturally only make sense for very specific ranges.

KSM and THP really seem like the notable exception[1], rather than a
rule. And we already *have* prctl() ops to modify per-process policies
for these. So I'm reluctant to agree we should drill open and expand
madvise() to cover them - especially with the goal or the expectation
that THP per-process policies shouldn't matter that much down the line.

I will admit, I don't hate prctl() as much as you do. It *is* a fairly
broad interface - setting per-process policies. So it's bound to pick
odds and ends of various subsystems that don't quite fit elsewhere.

Is it the right choice everywhere? Of course not. And can its
broadness be abused to avoid real interface design? Absolutely.

I don't think that makes it inherently bad, though. As long as we make
an honest effort to find the best home for new knobs.

But IMO, in this case it is. The inheritance-along-the-process-tree
behavior that we want here is an established pattern in prctl().

And *because* that propagation is a security-sensitive pattern
(although I don't think the THP policy specifically *is* a security
issue), to me it makes more sense to keep this behavior confined to
prctl() at least, and not add it to madvise too.

[1] Maybe we should have added sys_andrea() to cover those, as well as
    the SECCOMP prctl()s ;)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ