[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd062c92-faa9-46a6-99a8-bcc46209e102@redhat.com>
Date: Tue, 20 May 2025 17:28:35 +0200
From: David Hildenbrand <david@...hat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: "Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Arnd Bergmann <arnd@...db.de>, Christian Brauner <brauner@...nel.org>,
linux-mm@...ck.org, linux-arch@...r.kernel.org,
linux-kernel@...r.kernel.org, SeongJae Park <sj@...nel.org>,
Usama Arif <usamaarif642@...il.com>
Subject: Re: [RFC PATCH 0/5] add process_madvise() flags to modify behaviour
On 19.05.25 22:52, Lorenzo Stoakes wrote:
> REVIEWERS NOTES:
> ================
>
> This is a VERY EARLY version of the idea, it's relatively untested, and I'm
> 'putting it out there' for feedback. Any serious version of this will add a
> bunch of self-tests to assert correct behaviour and I will more carefully
> confirm everything's working.
>
> This is based on discussion arising from Usama's series [0], SJ's input on
> the thread around process_madvise() behaviour [1] (and a subsequent
> response by me [2]) and prior discussion about a new madvise() interface
> [3].
>
> [0]: https://lore.kernel.org/linux-mm/20250515133519.2779639-1-usamaarif642@gmail.com/
> [1]: https://lore.kernel.org/linux-mm/20250517162048.36347-1-sj@kernel.org/
> [2]: https://lore.kernel.org/linux-mm/e3ba284c-3cb1-42c1-a0ba-9c59374d0541@lucifer.local/
> [3]: https://lore.kernel.org/linux-mm/c390dd7e-0770-4d29-bb0e-f410ff6678e3@lucifer.local/
>
> ================
>
> Currently, we are rather restricted in how madvise() operations
> proceed. While effort has been put in to expanding what process_madvise()
> can do (that is - unrestricted application of advice to the local process
> alongside recent improvements on the efficiency of TLB operations over
> these batvches), we are still constrained by existing madvise() limitations
> and default behaviours.
>
> This series makes use of the currently unused flags field in
> process_madvise() to provide more flexiblity.
>
In general, sounds like an interesting approach.
> It introduces four flags:
>
> 1. PMADV_SKIP_ERRORS
>
> Currently, when an error arises applying advice in any individual VMA
> (keeping in mind that a range specified to madvise() or as part of the
> iovec passed to process_madvise()), the operation stops where it is and
> returns an error.
>
> This might not be the desired behaviour of the user, who may wish instead
> for the operation to be 'best effort'. By setting this flag, that behaviour
> is obtained.
>
> Since process_madvise() would trivially, if skipping errors, simply return
> the input vector size, we instead return the number of entries in the
> vector which completed successfully without error.
I would focus only on adding flags that we absolutely need to make the
use case we have in mind work. We can always add other flags as we see
fit for them (IOW, when really required ;) ).
Regarding MADV_HUGEPAGE / MADV_NOHUGEPAGE, this would not be required,
right?
>
> The PMADV_SKIP_ERRORS flag implicitly implies PMADV_NO_ERROR_ON_UNMAPPED.
>
> 2. PMADV_NO_ERROR_ON_UNMAPPED
>
> Currently madvise() has the peculiar behaviour of, if the range specified
> to it contains unmapped range(s), completing the full operation, but
> ultimately returning -ENOMEM.
>
> In the case of process_madvise(), this is fatal, as the operation will stop
> immediately upon this occurring.
>
> By setting PMADV_NO_ERROR_ON_UNMAPPED, the user can indicate that it wishes
> unmapped areas to simply be entirely ignored.
Again, is this really required? Couldn't we glue that to
PMADV_ENTIRE_ADDRESS_SPACE for our use case? After all, I don't expect
anybody to have something mapped into *the entire address space*.
Well, okay, maybe on 32bit, but still ... :)
>
> 3. PMADV_SET_FORK_EXEC_DEFAULT
>
> It may be desirable for a user to specify that all VMAs mapped in a process
> address space default to having an madvise() behaviour established by
> default, in such a fashion as that this persists across fork/exec.
This is very specific for MADV_HUGEPAGE only, so I wonder how we could
either avoid that flag or just make it clear that it shall stick around ...
Having that sad, PMADV_SET_FORK_EXEC_DEFAULT is rather a suboptimal name :(
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists