linux-kernel - Re: [PATCH v4 0/4] mm/userfaultfd: modulize memory types

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aPe0oWR9-Oj58Asz@x1.local>
Date: Tue, 21 Oct 2025 12:28:17 -0400
From: Peter Xu <peterx@...hat.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
	David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, Mike Rapoport <rppt@...nel.org>,
	Muchun Song <muchun.song@...ux.dev>,
	Nikita Kalyazin <kalyazin@...zon.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	Axel Rasmussen <axelrasmussen@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	James Houghton <jthoughton@...gle.com>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Hugh Dickins <hughd@...gle.com>, Michal Hocko <mhocko@...e.com>,
	Ujwal Kundur <ujwal.kundur@...il.com>,
	Oscar Salvador <osalvador@...e.de>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [PATCH v4 0/4] mm/userfaultfd: modulize memory types

On Tue, Oct 21, 2025 at 11:51:33AM -0400, Liam R. Howlett wrote:
> * Peter Xu <peterx@...hat.com> [251020 10:12]:
> > On Mon, Oct 20, 2025 at 03:34:47PM +0200, David Hildenbrand wrote:
> > > On 15.10.25 01:14, Peter Xu wrote:
> > > > [based on latest akpm/mm-new of Oct 14th, commit 36c6c5ce1b275]
> > > > 
> > > > v4:
> > > > - Some cleanups within vma_can_userfault() [David]
> > > > - Rename uffd_get_folio() to minor_get_folio() [David]
> > > > - Remove uffd_features in vm_uffd_ops, deduce it from supported ioctls [David]
> > > > 
> > > > v1: https://lore.kernel.org/r/20250620190342.1780170-1-peterx@redhat.com
> > > > v2: https://lore.kernel.org/r/20250627154655.2085903-1-peterx@redhat.com
> > > > v3: https://lore.kernel.org/r/20250926211650.525109-1-peterx@redhat.com
> > > > 
> > > > This series is an alternative proposal of what Nikita proposed here on the
> > > > initial three patches:
> > > > 
> > > >    https://lore.kernel.org/r/20250404154352.23078-1-kalyazin@amazon.com
> > > > 
> > > > This is not yet relevant to any guest-memfd support, but paving way for it.
> > > > Here, the major goal is to make kernel modules be able to opt-in with any
> > > > form of userfaultfd supports, like guest-memfd.  This alternative option
> > > > should hopefully be cleaner, and avoid leaking userfault details into
> > > > vm_ops.fault().
> > > > 
> > > > It also means this series does not depend on anything.  It's a pure
> > > > refactoring of userfaultfd internals to provide a generic API, so that
> > > > other types of files, especially RAM based, can support userfaultfd without
> > > > touching mm/ at all.
> > > > 
> > > > To achieve it, this series introduced a file operation called vm_uffd_ops.
> > > > The ops needs to be provided when a file type supports any of userfaultfd.
> > > > 
> > > > With that, I moved both hugetlbfs and shmem over, whenever possible.  So
> > > > far due to concerns on exposing an uffd_copy() API, the MISSING faults are
> > > > still separately processed and can only be done within mm/.  Hugetlbfs kept
> > > > its special paths untouched.
> > > > 
> > > > An example of shmem uffd_ops:
> > > > 
> > > > static const struct vm_uffd_ops shmem_uffd_ops = {
> > > > 	.supported_ioctls	=	BIT(_UFFDIO_COPY) |
> > > > 					BIT(_UFFDIO_ZEROPAGE) |
> > > > 					BIT(_UFFDIO_WRITEPROTECT) |
> > > > 					BIT(_UFFDIO_CONTINUE) |
> > > > 					BIT(_UFFDIO_POISON),
> > > > 	.minor_get_folio	=	shmem_uffd_get_folio,
> > > > };
> 
> I think you forgot to add the link to the guest_memfd implementation [1]
> to your cover letter.

I didn't.

https://lore.kernel.org/all/20251014231501.2301398-1-peterx@redhat.com/

    To show another sample, this is the patch that Nikita posted to implement
    minor fault for guest-memfd (on top of older versions of this series):

      https://lore.kernel.org/all/114133f5-0282-463d-9d65-3143aa658806@amazon.com/


> 
> > > 
> > > This looks better than the previous version to me.
> > > 
> > > Long term the goal should be to move all hugetlb/shmem specific stuff out of
> > > mm/hugetlb.c and of course, we won't be adding any new ones to
> > > mm/userfaultfd.c
> > > 
> > > I agree with Liam that a better interface could be providing default
> > > handlers for the separate ioctls [1], but there is always the option to
> > > evolve this interface into something like that later.
> > 
> > Thanks for accepting this current form.
> > 
> > > 
> > > 
> > > [1] https://lkml.kernel.org/r/frnos5jtmlqvzpcrredcoummuzvllweku5dgp5ii5in6epwnw5@anu4dqsz6shy
> > 
> > I have replied to that, here:
> > 
> > https://lore.kernel.org/all/aOVEDii4HPB6outm@x1.local/
> > 
> > If we ignore hugetlbfs, most of the hooks may not be needed, as explained.
> 
> Those were examples.
> 
> Hooks allow for all the memory type checking to go away in the code,
> which allows for more readable code and less operations per call.
> 
> > 
> > If we introduce hooks only for hugetlbfs, IMHO it's going backwards.  When
> > we want to get rid of hugetlbfs paths, we will have something more to get
> > rid of..
> 
> This is just wrong.
> 
> It is far easier to remove one function pointer than go through all the
> code and remove the checks for hugetlbfs.
> 
> Are you thinking the hooks will just point to the generic function?
> This is the only way I can see your statement making sense.  That's not
> the idea I'm trying to communicate.
> 
> The idea is that you split the functions into parts that everyone does
> and special parts, then call them in the correct sequence for each type.
> New types need new special parts while using the generic code for the
> majority of the work.
> 
> In this way, the memory types are modularized into function pointers
> that all use common code without adding complexity.  In fact, knowing
> implicitly which context from call path means we don't need to check the
> types and should be able to reduce the complexity.
> 
> Then adding a new memory type will call almost all the same functions
> except for special areas.
> 
> Removing old memory types would me removing the special areas only - and
> maybe a function pointer if they are the only user.
> 
> The current patch set does not modularizing memory, it is creating a
> middleware level where we have to parse a value to figure out what to
> do.
> 
> These patches DO expose a method for memory types to be coded in a
> kernel module, which is fundamentally different than modularizing the
> memory types.  Different enough to be glossed over on a ML by looking at
> the subject alone.
> 
> Yes, one value is better than two values, but no magic values is ideal.
> 
> Is it a significant amount of work to remove the magic value by
> fragmenting the code into memory type specific function pointers?
> 
> IOW, instead of decoding the value to figure out where to route calls,
> just expose the calls directly in the function pointer layer that you
> are creating?  What is the minimum amount of function pointers to get
> the guest_memfd to work without this value being parsed?
> 
> [1].  https://lore.kernel.org/all/114133f5-0282-463d-9d65-3143aa658806@amazon.com/

I don't know what you're looking for.

I think I got most acks from userfaultfd developers whoever were active in
the past few years, ever since v1...

Then, we got some concern on uffd_copy() API being complicated, it's fine,
I dropped it.

We got some other concern on having a function returning folio pointer.  We
talked it all through, luckily, even if I do not know what really happened.

Now, I really don't know what you're suggesting here.

Can you send some patches and show us the code, help everyone to support
guest-memfd minor fault, please?

-- 
Peter Xu