[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4czztpp7emy7gnigoa7aap2expmlnrpvhugko7q4ycfj2ikuck@v6aq7tzr6yeq>
Date: Thu, 18 Sep 2025 12:47:41 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: Nikita Kalyazin <kalyazin@...zon.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Peter Xu <peterx@...hat.com>, David Hildenbrand <david@...hat.com>,
Suren Baghdasaryan <surenb@...gle.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>,
Muchun Song <muchun.song@...ux.dev>, Hugh Dickins <hughd@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
James Houghton <jthoughton@...gle.com>, Michal Hocko <mhocko@...e.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Oscar Salvador <osalvador@...e.de>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Ujwal Kundur <ujwal.kundur@...il.com>
Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API
* Mike Rapoport <rppt@...nel.org> [250918 04:37]:
> On Wed, Sep 17, 2025 at 12:53:05PM -0400, Liam R. Howlett wrote:
> > * Mike Rapoport <rppt@...nel.org> [250917 05:26]:
> > > Hi Liam,
> > >
> > > On Mon, Sep 08, 2025 at 12:53:37PM -0400, Liam R. Howlett wrote:
> > > >
> > > > Reading through the patches, I'm not entirely sure what you are
> > > > proposing.
> > > >
> > > > What I was hoping to see by a generalization of the memory types is a
> > > > much simpler shared code base until the code hit memory type specific
> > > > areas where a function pointer could be used to keep things from getting
> > > > complicated (or, I guess a switch statement..).
> > > >
> > > > What we don't want is non-mm code specifying values for the function
> > > > pointer and doing what they want, or a function pointer that returns a
> > > > core mm resource (in the old example this was a vma, here it is a
> > > > folio).
> > > >
> > > > From this patch set:
> > > > + * Return: zero if succeeded, negative for errors.
> > > > + */
> > > > + int (*uffd_get_folio)(struct inode *inode, pgoff_t pgoff,
> > > > + struct folio **folio);
> > > >
> > > > This is one of the contention points in the current scenario as the
> > > > folio would be returned.
> > >
> > > I don't see a problem with it. It's not any different from
> > > vma_ops->fault(): a callback for a filesystem to get a folio that will be
> > > mapped afterwards by the mm code.
> > >
> >
> > I disagree, the filesystem vma_ops->fault() is not a config option like
> > this one. So we are on a path to enable uffd by default, and it really
> > needs work beyond this series. Setting up a list head and passing in
> > through every call stack is far from idea.
>
> I don't follow you here. How addition of uffd callbacks guarded by a config
> option to vma_ops leads to enabling uffd by by default?
Any new memory type that uses the above interface now needs uffd
enabled, anyone planning to use guest_memfd needs it enabled, anyone
able to get a module using this interface needs it enabled (by whoever
gives them the kernel they use). Kernel provides now need to enable
UFFD - which is different than the example provided.
>
> > I also think the filesystem model is not one we want to duplicate in mm
> > for memory types - think of the test issues we have now and then have a
> > look at the xfstests support of filesystems [1].
> >
> > So we are on a path of less test coverage, and more code that is
> > actually about mm that is outside of mm. So, is there another way?
>
> There are quite a few vma_ops outside fs/ not covered by xfstest, so the
> test coverage argument is moot at best.
> And anything in the kernel can grab a folio and do whatever it pleases.
Testing filesystems is nothing short of a nightmare and I don't want mm
to march happily towards that end. This interface is endlessly flexible
and thus endlessly broken and working at the same time.
IOW, we have given anyone wanting to implement a new memory type
infinite freedoms to run afoul, but they won't be looking for those
people when things go horribly wrong - they will most likely see a
memory issue and come here. syzbot will see a hang on some mm lock in an
unrelated task, or whatever.
I would rather avoid the endlessly flexible interface to avoid incorrect
uses in favour of a limited selection of choices, that could be expanded
if necessary, but would be more visible to the mm people going in. That
is, people can add new memory types through adding them to mm/ instead
of in driver/ or out of tree.
I could very much see someone looking to use this for a binder-type
driver and that might work out really well! But I don't want someone
doing it and shoving the folio pointer in a custom struct because they
*know* it's fine, so what's the big deal? I don't mean to pick on
binder, but this example comes to mind.
>
> Nevertheless, let's step back for a second and instead focus on the problem
> these patches are trying to solve, which is to allow guest_memfd implement
> UFFD_CONTINUE (or minor fault in other terminology).
Well, this is about modularizing memory types, but the first user is
supposed to be the guest-memfd support.
>
> This means uffd should be able to map a folio that's already in
> guest_memfd page cache to the faulted address. Obviously, the page table
> update happens in uffd. But it still has to find what to map and we need
> some way to let guest_memfd tell that to uffd.
>
> So we need a hook somewhere that will return a folio matching pgoff in
> vma->file->inode.
>
> Do you see a way to implement it otherwise?
I must be missing something.
UFFDIO_CONTINUE currently enters through an ioctl that calls
userfaultfd_continue() -> mfill_atomic_continue()... mfill_atomic() gets
and uses the folio to actually do the work. Right now, we don't hand
out the folio, so what is different here?
I am under the impression that we don't need to return the folio, but
may need to do work on it. That is, we can give the mm side what it
needs to call the related memory type functions to service the request.
For example, one could pass in the inode, pgoff, and memory type and the
mm code could then call the fault handler for that memory type?
I didn't think Nikita had a folio returned in his first three patches
[1], but then they built on other patches and it was difficult to follow
along. Is it because that interface was agreed on in a call on 23 Jan
2025 [2], as somewhat unclearly stated in [1]?
Thanks,
Liam
[1]. https://lore.kernel.org/all/20250404154352.23078-1-kalyazin@amazon.com/
[2]. https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAosPOk/edit?tab=t.0#heading=h.w1126rgli5e3
Powered by blists - more mailing lists