lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1747686021.git.lorenzo.stoakes@oracle.com>
Date: Mon, 19 May 2025 21:52:37 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        David Hildenbrand <david@...hat.com>, Vlastimil Babka <vbabka@...e.cz>,
        Jann Horn <jannh@...gle.com>, Arnd Bergmann <arnd@...db.de>,
        Christian Brauner <brauner@...nel.org>, linux-mm@...ck.org,
        linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
        SeongJae Park <sj@...nel.org>, Usama Arif <usamaarif642@...il.com>
Subject: [RFC PATCH 0/5] add process_madvise() flags to modify behaviour

REVIEWERS NOTES:
================

This is a VERY EARLY version of the idea, it's relatively untested, and I'm
'putting it out there' for feedback. Any serious version of this will add a
bunch of self-tests to assert correct behaviour and I will more carefully
confirm everything's working.

This is based on discussion arising from Usama's series [0], SJ's input on
the thread around process_madvise() behaviour [1] (and a subsequent
response by me [2]) and prior discussion about a new madvise() interface
[3].

[0]: https://lore.kernel.org/linux-mm/20250515133519.2779639-1-usamaarif642@gmail.com/
[1]: https://lore.kernel.org/linux-mm/20250517162048.36347-1-sj@kernel.org/
[2]: https://lore.kernel.org/linux-mm/e3ba284c-3cb1-42c1-a0ba-9c59374d0541@lucifer.local/
[3]: https://lore.kernel.org/linux-mm/c390dd7e-0770-4d29-bb0e-f410ff6678e3@lucifer.local/

================

Currently, we are rather restricted in how madvise() operations
proceed. While effort has been put in to expanding what process_madvise()
can do (that is - unrestricted application of advice to the local process
alongside recent improvements on the efficiency of TLB operations over
these batvches), we are still constrained by existing madvise() limitations
and default behaviours.

This series makes use of the currently unused flags field in
process_madvise() to provide more flexiblity.

It introduces four flags:

1. PMADV_SKIP_ERRORS

Currently, when an error arises applying advice in any individual VMA
(keeping in mind that a range specified to madvise() or as part of the
iovec passed to process_madvise()), the operation stops where it is and
returns an error.

This might not be the desired behaviour of the user, who may wish instead
for the operation to be 'best effort'. By setting this flag, that behaviour
is obtained.

Since process_madvise() would trivially, if skipping errors, simply return
the input vector size, we instead return the number of entries in the
vector which completed successfully without error.

The PMADV_SKIP_ERRORS flag implicitly implies PMADV_NO_ERROR_ON_UNMAPPED.

2. PMADV_NO_ERROR_ON_UNMAPPED

Currently madvise() has the peculiar behaviour of, if the range specified
to it contains unmapped range(s), completing the full operation, but
ultimately returning -ENOMEM.

In the case of process_madvise(), this is fatal, as the operation will stop
immediately upon this occurring.

By setting PMADV_NO_ERROR_ON_UNMAPPED, the user can indicate that it wishes
unmapped areas to simply be entirely ignored.

3. PMADV_SET_FORK_EXEC_DEFAULT

It may be desirable for a user to specify that all VMAs mapped in a process
address space default to having an madvise() behaviour established by
default, in such a fashion as that this persists across fork/exec.

Since this is a very powerful option that would make no sense for many
advice modes, we explicitly only permit known-safe flags here (currently
MADV_HUGEPAGE and MADV_NOHUGEPAGE only).

4. PMADV_ENTIRE_ADDRESS_SPACE

It can be annoying, should a user wish to apply madvise() to all VMAs in an
address space, to have to add a singular large entry to the input iovec.

So provide sugar to permit this - PMADV_ENTIRE_ADDRESS_SPACE. If specified,
we expect the user to pass NULL and -1 to the vec and vlen parameters
respectively so they explicitly acknowledge that these will be ignored,
e.g.:

	process_madvise(PIDFD_SELF, NULL, -1, MADV_HUGEPAGE,
			PMADV_ENTIRE_ADDRESS_SPACE | PMADV_SKIP_ERRORS);

Usually a user ought to prefer setting PMADV_SKIP_ERRORS here as it may
well be the case that incompatible VMAs will be encountered that ought to
be skipped.

If this is not set, the PMADV_NO_ERROR_ON_UNMAPPED (which was otherwise
implicitly implied by PMADV_SKIP_ERRORS) ought to be set as of course, the
entire address space spans at least some gaps.

Lorenzo Stoakes (5):
  mm: madvise: refactor madvise_populate()
  mm/madvise: add PMADV_SKIP_ERRORS process_madvise() flag
  mm/madvise: add PMADV_NO_ERROR_ON_UNMAPPED process_madvise() flag
  mm/madvise: add PMADV_SET_FORK_EXEC_DEFAULT process_madvise() flag
  mm/madvise: add PMADV_ENTIRE_ADDRESS_SPACE process_madvise() flag

 include/uapi/asm-generic/mman-common.h |   6 +
 mm/madvise.c                           | 206 +++++++++++++++++++------
 2 files changed, 168 insertions(+), 44 deletions(-)

--
2.49.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ