[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aPe0oWR9-Oj58Asz@x1.local>
Date: Tue, 21 Oct 2025 12:28:17 -0400
From: Peter Xu <peterx@...hat.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, Mike Rapoport <rppt@...nel.org>,
Muchun Song <muchun.song@...ux.dev>,
Nikita Kalyazin <kalyazin@...zon.com>,
Vlastimil Babka <vbabka@...e.cz>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
James Houghton <jthoughton@...gle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Hugh Dickins <hughd@...gle.com>, Michal Hocko <mhocko@...e.com>,
Ujwal Kundur <ujwal.kundur@...il.com>,
Oscar Salvador <osalvador@...e.de>,
Suren Baghdasaryan <surenb@...gle.com>,
Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [PATCH v4 0/4] mm/userfaultfd: modulize memory types
On Tue, Oct 21, 2025 at 11:51:33AM -0400, Liam R. Howlett wrote:
> * Peter Xu <peterx@...hat.com> [251020 10:12]:
> > On Mon, Oct 20, 2025 at 03:34:47PM +0200, David Hildenbrand wrote:
> > > On 15.10.25 01:14, Peter Xu wrote:
> > > > [based on latest akpm/mm-new of Oct 14th, commit 36c6c5ce1b275]
> > > >
> > > > v4:
> > > > - Some cleanups within vma_can_userfault() [David]
> > > > - Rename uffd_get_folio() to minor_get_folio() [David]
> > > > - Remove uffd_features in vm_uffd_ops, deduce it from supported ioctls [David]
> > > >
> > > > v1: https://lore.kernel.org/r/20250620190342.1780170-1-peterx@redhat.com
> > > > v2: https://lore.kernel.org/r/20250627154655.2085903-1-peterx@redhat.com
> > > > v3: https://lore.kernel.org/r/20250926211650.525109-1-peterx@redhat.com
> > > >
> > > > This series is an alternative proposal of what Nikita proposed here on the
> > > > initial three patches:
> > > >
> > > > https://lore.kernel.org/r/20250404154352.23078-1-kalyazin@amazon.com
> > > >
> > > > This is not yet relevant to any guest-memfd support, but paving way for it.
> > > > Here, the major goal is to make kernel modules be able to opt-in with any
> > > > form of userfaultfd supports, like guest-memfd. This alternative option
> > > > should hopefully be cleaner, and avoid leaking userfault details into
> > > > vm_ops.fault().
> > > >
> > > > It also means this series does not depend on anything. It's a pure
> > > > refactoring of userfaultfd internals to provide a generic API, so that
> > > > other types of files, especially RAM based, can support userfaultfd without
> > > > touching mm/ at all.
> > > >
> > > > To achieve it, this series introduced a file operation called vm_uffd_ops.
> > > > The ops needs to be provided when a file type supports any of userfaultfd.
> > > >
> > > > With that, I moved both hugetlbfs and shmem over, whenever possible. So
> > > > far due to concerns on exposing an uffd_copy() API, the MISSING faults are
> > > > still separately processed and can only be done within mm/. Hugetlbfs kept
> > > > its special paths untouched.
> > > >
> > > > An example of shmem uffd_ops:
> > > >
> > > > static const struct vm_uffd_ops shmem_uffd_ops = {
> > > > .supported_ioctls = BIT(_UFFDIO_COPY) |
> > > > BIT(_UFFDIO_ZEROPAGE) |
> > > > BIT(_UFFDIO_WRITEPROTECT) |
> > > > BIT(_UFFDIO_CONTINUE) |
> > > > BIT(_UFFDIO_POISON),
> > > > .minor_get_folio = shmem_uffd_get_folio,
> > > > };
>
> I think you forgot to add the link to the guest_memfd implementation [1]
> to your cover letter.
I didn't.
https://lore.kernel.org/all/20251014231501.2301398-1-peterx@redhat.com/
To show another sample, this is the patch that Nikita posted to implement
minor fault for guest-memfd (on top of older versions of this series):
https://lore.kernel.org/all/114133f5-0282-463d-9d65-3143aa658806@amazon.com/
>
> > >
> > > This looks better than the previous version to me.
> > >
> > > Long term the goal should be to move all hugetlb/shmem specific stuff out of
> > > mm/hugetlb.c and of course, we won't be adding any new ones to
> > > mm/userfaultfd.c
> > >
> > > I agree with Liam that a better interface could be providing default
> > > handlers for the separate ioctls [1], but there is always the option to
> > > evolve this interface into something like that later.
> >
> > Thanks for accepting this current form.
> >
> > >
> > >
> > > [1] https://lkml.kernel.org/r/frnos5jtmlqvzpcrredcoummuzvllweku5dgp5ii5in6epwnw5@anu4dqsz6shy
> >
> > I have replied to that, here:
> >
> > https://lore.kernel.org/all/aOVEDii4HPB6outm@x1.local/
> >
> > If we ignore hugetlbfs, most of the hooks may not be needed, as explained.
>
> Those were examples.
>
> Hooks allow for all the memory type checking to go away in the code,
> which allows for more readable code and less operations per call.
>
> >
> > If we introduce hooks only for hugetlbfs, IMHO it's going backwards. When
> > we want to get rid of hugetlbfs paths, we will have something more to get
> > rid of..
>
> This is just wrong.
>
> It is far easier to remove one function pointer than go through all the
> code and remove the checks for hugetlbfs.
>
> Are you thinking the hooks will just point to the generic function?
> This is the only way I can see your statement making sense. That's not
> the idea I'm trying to communicate.
>
> The idea is that you split the functions into parts that everyone does
> and special parts, then call them in the correct sequence for each type.
> New types need new special parts while using the generic code for the
> majority of the work.
>
> In this way, the memory types are modularized into function pointers
> that all use common code without adding complexity. In fact, knowing
> implicitly which context from call path means we don't need to check the
> types and should be able to reduce the complexity.
>
> Then adding a new memory type will call almost all the same functions
> except for special areas.
>
> Removing old memory types would me removing the special areas only - and
> maybe a function pointer if they are the only user.
>
> The current patch set does not modularizing memory, it is creating a
> middleware level where we have to parse a value to figure out what to
> do.
>
> These patches DO expose a method for memory types to be coded in a
> kernel module, which is fundamentally different than modularizing the
> memory types. Different enough to be glossed over on a ML by looking at
> the subject alone.
>
> Yes, one value is better than two values, but no magic values is ideal.
>
> Is it a significant amount of work to remove the magic value by
> fragmenting the code into memory type specific function pointers?
>
> IOW, instead of decoding the value to figure out where to route calls,
> just expose the calls directly in the function pointer layer that you
> are creating? What is the minimum amount of function pointers to get
> the guest_memfd to work without this value being parsed?
>
> [1]. https://lore.kernel.org/all/114133f5-0282-463d-9d65-3143aa658806@amazon.com/
I don't know what you're looking for.
I think I got most acks from userfaultfd developers whoever were active in
the past few years, ever since v1...
Then, we got some concern on uffd_copy() API being complicated, it's fine,
I dropped it.
We got some other concern on having a function returning folio pointer. We
talked it all through, luckily, even if I do not know what really happened.
Now, I really don't know what you're suggesting here.
Can you send some patches and show us the code, help everyone to support
guest-memfd minor fault, please?
--
Peter Xu
Powered by blists - more mailing lists