[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54bb09fc-144b-4d61-8bc2-1eca4d278382@lucifer.local>
Date: Thu, 3 Jul 2025 16:55:01 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Peter Xu <peterx@...hat.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Vlastimil Babka <vbabka@...e.cz>,
Suren Baghdasaryan <surenb@...gle.com>,
Muchun Song <muchun.song@...ux.dev>, Mike Rapoport <rppt@...nel.org>,
Hugh Dickins <hughd@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
James Houghton <jthoughton@...gle.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Nikita Kalyazin <kalyazin@...zon.com>, Michal Hocko <mhocko@...e.com>,
David Hildenbrand <david@...hat.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Oscar Salvador <osalvador@...e.de>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Ujwal Kundur <ujwal.kundur@...il.com>
Subject: Re: [PATCH v2 0/4] mm/userfaultfd: modulize memory types
Peter you've ignored a large part of what I've said here, I'm not sure this is
productive.
On Wed, Jul 02, 2025 at 04:36:31PM -0400, Peter Xu wrote:
> On Mon, Jun 30, 2025 at 11:29:30AM +0100, Lorenzo Stoakes wrote:
> > On Fri, Jun 27, 2025 at 11:46:51AM -0400, Peter Xu wrote:
> > > [based on latest akpm/mm-new of June 27th, commit 9be7387ae43f]
> > >
> > > v2 changelog:
> > > - Patch 1
> > > - update English in commit log [David]
> > > - move vm_uffd_ops definition to userfaultfd_k.h [Mike]
> > > - Patch 4
> > > - fix sparse warning on bitwise type conversions [syzbot]
> > > - Commit message updates on explanation of vma_can_userfault check [James]
> > >
> > > v1: https://lore.kernel.org/r/20250620190342.1780170-1-peterx@redhat.com
> > >
> > > This series is an alternative proposal of what Nikita proposed here on the
> > > initial three patches:
> > >
> > > https://lore.kernel.org/r/20250404154352.23078-1-kalyazin@amazon.com
> > >
> > > This is not yet relevant to any guest-memfd support, but paving way for it.
> > > Here, the major goal is to make kernel modules be able to opt-in with any
> > > form of userfaultfd supports, like guest-memfd. This alternative option
> > > should hopefully be cleaner, and avoid leaking userfault details into
> > > vm_ops.fault().
> > >
> > > It also means this series does not depend on anything. It's a pure
> > > refactoring of userfaultfd internals to provide a generic API, so that
> > > other types of files, especially RAM based, can support userfaultfd without
> > > touching mm/ at all.
> >
> > I'm very concerned that this change will simply move core mm functionality out
> > of mm and into drivers where it can bitrot and cause subtle bugs?
> >
> > You're proposing providing stuff like page table state and asking for a folio
> > back from a driver etc.
> >
> > I absolutely am not in favour of us providing core mm internals like this to
> > drivers, and I don't want to see us having to EXPORT() mm internals just to make
> > module-ised uffd code work (I mean I just will flat out refuse to do that).
> >
> > I think we need to think _very_ carefully about how we do this.
> >
> > I also feel like this series is at a really basic level and you've not fully
> > determined what API calls you need.
>
> See:
>
> https://lore.kernel.org/all/aGWVIjmmsmskA4bp@x1.local/#t
OK so it seems the point you're making there is - there's a lot of complexity -
and you'd rather not think about it for now?
My concern is that in _actuality_ you'll need to do a _lot_ more and expose a
_lot_ more mm internals to achieve what you need to.
I am happy for the API to be internal-to-mm-only.
My concerns are really simple:
- exposing page table manipulation outside of mm is something I just cannot
accept us doing for reasons I already mentioned and also Liam
- we should absolutely minimise how much we do expose
- we mustn't have hooks that don't allow us to make sensible decisions in core
mm code.
I think overall I'm just not in favour of us having uffd functionality in
modules, I think it's an abstraction mismatch.
Now if you instead had something like:
enum uffd_minor_fault_handler_type {
UFFD_MINOR_FAULT_DEFAULT_HANDLER,
UFFD_MINOR_FAULT_FOO_HANDLER,
UFFD_MINOR_FAULT_BAR_HANDLER,
etc.
};
And the module could _choose_ which should be used then that's far far better.
Really - hermetically seal the sensitive parts but allow module choice.
You could find ways to do this in a more sophisticated way too by e.g. having a
driver callback that provides a config struct that sets things up.
If we're going 'simple first' and can 'change what we want' why not do it like
this to start?
>
> >
> > I agree that it's sensible to be incremental, but I feel like you sort of need
> > to somewhat prove the case that you can jump from 'incremental version where we
> > only support code in mm/' to supporting arbitrary file system code that might be
> > modules.
> >
> > Because otherwise you're basically _guessing_ that you can do this, possibly, in
> > the future and maybe it's just not the right approach but that's not clear yet?
>
> Did you follow up with the discussions in v1? I copied you too.
>
> https://lore.kernel.org/r/114133f5-0282-463d-9d65-3143aa658806@amazon.com
>
> Would Nikita's work help here? Could you explain what are you asking for
> to prove that this works for us?
Yeah thanks this seems like it actually implements useful functionality. The
point is the API seems currently to be a stub so doesn't really give us much to
go on as to what might ultimately be required.
You say yourself in the series that you'll likely need to expand things and you
already have todos to this effect.
>
> >
> > >
> > > To achieve it, this series introduced a file operation called vm_uffd_ops.
> > > The ops needs to be provided when a file type supports any of userfaultfd.
> > >
> > > With that, I moved both hugetlbfs and shmem over.
> >
> > Well as you say below hugetlbfs is sort of a stub implementation, I wonder
> > whether we'd need quite a bit more to make that work.
> >
> > One thing I'd _really_ like to avoid is us having to add a bunch of hook points
> > into core mm code just for uffd that then call out to some driver.
> >
> > We've encountered such a total nightmare with .mmap() for instance in the past
> > (including stuff that resulted in security issues) because we - simply cannot
> > assume anything - about what the hook implementor might do with the passed
> > parameters.
> >
> > This is really really problematic.
> >
> > I also absolutely hate the:
> >
> > if (uffd)
> > do_something_weird();
> >
> > Pattern, so hopefully this won't proliferate that.
^ you didn't respond to this.
> >
> > >
> > > Hugetlbfs is still very special that it will only use partial of the
> > > vm_uffd_ops API, due to similar reason why hugetlb_vm_op_fault() has a
> > > BUG() and so far hard-coded into core mm. But this should still be better,
> > > because at least hugetlbfs is still always involved in feature probing
> > > (e.g. where it used to not support ZEROPAGE and we have a hard-coded line
> > > to fail that, and some more). Meanwhile after this series, shmem will be
> > > completely converted to the new vm_uffd_ops API; the final vm_uffd_ops for
> > > shmem looks like this:
> > >
> > > static const vm_uffd_ops shmem_uffd_ops = {
> > > .uffd_features = __VM_UFFD_FLAGS,
> > > .uffd_ioctls = BIT(_UFFDIO_COPY) |
> > > BIT(_UFFDIO_ZEROPAGE) |
> > > BIT(_UFFDIO_WRITEPROTECT) |
> > > BIT(_UFFDIO_CONTINUE) |
> > > BIT(_UFFDIO_POISON),
> > > .uffd_get_folio = shmem_uffd_get_folio,
> > > .uffd_copy = shmem_mfill_atomic_pte,
> > > };
> > >
> > > As I mentioned in one of my reply to Nikita, I don't like the current
> > > interface of uffd_copy(), but this will be the minimum change version of
> > > such API to support complete extrenal-module-ready userfaultfd. Here, very
> > > minimal change will be needed from shmem side to support that.
> >
> > Right, maybe a better version of this interface might address some of my
> > concerns... :)
> >
> > >
> > > Meanwhile, the vm_uffd_ops is also not the only place one will need to
> > > provide to support userfaultfd. Normally vm_ops.fault() will also need to
> > > be updated, but that's a generic function and it'll play together with the
> > > new vm_uffd_ops to make everything fly.
> > >
> > > No functional change expected at all after the whole series applied. There
> > > might be some slightly stricter check on uffd ops here and there in the
> > > last patch, but that really shouldn't stand out anywhere to anyone.
> > >
> > > For testing: besides the cross-compilation tests, I did also try with
> > > uffd-stress in a VM to measure any perf difference before/after the change;
> > > The static call becomes a pointer now. I really cannot measure anything
> > > different, which is more or less expected.
> > >
> > > Comments welcomed, thanks.
> > >
> > > Peter Xu (4):
> > > mm: Introduce vm_uffd_ops API
> > > mm/shmem: Support vm_uffd_ops API
> > > mm/hugetlb: Support vm_uffd_ops API
> > > mm: Apply vm_uffd_ops API to core mm
> > >
> > > include/linux/mm.h | 9 +++
> > > include/linux/shmem_fs.h | 14 -----
> > > include/linux/userfaultfd_k.h | 98 +++++++++++++++++++----------
> > > mm/hugetlb.c | 19 ++++++
> > > mm/shmem.c | 28 ++++++++-
> > > mm/userfaultfd.c | 115 +++++++++++++++++++++++++---------
> > > 6 files changed, 207 insertions(+), 76 deletions(-)
> > >
> > > --
> > > 2.49.0
> > >
> >
> > Sorry to be critical, I just want to make sure we're not setting ourselves up
> > for trouble here.
> >
> > I _very much_ support efforts to make uffd more generalised, and ideally to find
> > a way to separate out shmem and hugetlbfs implementation bits, so I support the
> > intent _fully_.
> >
> > I just want to make sure we do it in a safe way :)
>
> Any explicit suggestions (besides objections)?
I gave some above.
I'm fundamentally opposed to us exposing page table manipulation or really any
state subject to sensitive locks to modules.
Cheers, Lorenzo
Powered by blists - more mailing lists