[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <rsumpavnbe63jvfz7w63batked4obgnzmz4boeee6q3wiv7smi@rkok2a4vug72>
Date: Wed, 2 Jul 2025 17:24:02 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Nikita Kalyazin <kalyazin@...zon.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Suren Baghdasaryan <surenb@...gle.com>, Peter Xu <peterx@...hat.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Vlastimil Babka <vbabka@...e.cz>, Muchun Song <muchun.song@...ux.dev>,
Mike Rapoport <rppt@...nel.org>, Hugh Dickins <hughd@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
James Houghton <jthoughton@...gle.com>, Michal Hocko <mhocko@...e.com>,
David Hildenbrand <david@...hat.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Oscar Salvador <osalvador@...e.de>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Ujwal Kundur <ujwal.kundur@...il.com>
Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API
* Nikita Kalyazin <kalyazin@...zon.com> [250702 13:08]:
>
>
> On 02/07/2025 16:56, Lorenzo Stoakes wrote:
> > On Tue, Jul 01, 2025 at 10:04:28AM -0700, Suren Baghdasaryan wrote:
> > > On Mon, Jun 30, 2025 at 3:16 AM Lorenzo Stoakes
> > > <lorenzo.stoakes@...cle.com> wrote:
> > > > This feels like you're trying to put mm functionality outside of mm?
> > >
> > > To second that, two things stick out for me here:
> > > 1. uffd_copy and uffd_get_folio seem to be at different abstraction
> > > levels. uffd_copy is almost the entire copy operation for VM_SHARED
> > > VMAs while uffd_get_folio is a small part of the continue operation.
> > > 2. shmem_mfill_atomic_pte which becomes uffd_copy for shmem in the
> > > last patch is quite a complex function which itself calls some IMO
> > > pretty internal functions like mfill_atomic_install_pte(). Expecting
> > > modules to implement such functionality seems like a stretch to me but
> > > maybe this is for some specialized modules which are written by mm
> > > experts only?
> >
> > To echo what Liam said - I don't think we can truly rely on expertise here
> > (we make enough mistakes in core mm for that to be a dubious proposition
> > even tere :) and even if experts were involved, having core mm
> > functionality outside of core mm carries significant risk - we are
> > constantly changing things, including assumptions around sensitive topics
> > such as locking (think VMA locking) - having code elsewhere significantly
> > increases the risk of missing things.
> >
> > I am also absolutely, to be frank, not going to accept us EXPORT()'ing
> > anything core.
> >
> > Page table manipulation really must rely in core mm and arch code only, it
> > is easily some of the most subtle, confusing and dangerous code in mm (I
> > have spent subtantial hours banging my head against it recently), and again
> > - subject to constant change.
> >
> > But to come back to Liam's comments and to reiterate what I was referring
> > to earlier, even permitting drivers to have access to VMAs is _highly_
> > problematic and has resulted in very real bugs and subtle issues that took
> > many hours, much stress + gnashing of teeth to adress.
>
> The main target of this change is the implementation of UFFD for
> KVM/guest_memfd (examples: [1], [2]) to avoid bringing KVM-specific code
> into the mm codebase. We usually mean KVM by the "drivers" in this context,
> and it is already somewhat "knowledgeable" of the mm. I don't think there
> are existing use cases for other drivers to implement this at the moment.
>
> Although I can't see new exports in this series, there is now a way to limit
> exports to particular modules [3]. Would it help if we only do it for KVM
> initially (if/when actually needed)?
>
> [1]
> https://lore.kernel.org/all/114133f5-0282-463d-9d65-3143aa658806@amazon.com/
> [2]
> https://lore.kernel.org/all/7666ee96-6f09-4dc1-8cb2-002a2d2a29cf@amazon.com/
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git/commit/?h=kbuild&id=707f853d7fa3ce323a6875487890c213e34d81a0
>
That's because the entry point is from a function pointer, so [3] won't
help at all.
It is recreating the situation that existed for the vma through the
vm_ops in mmap, but for uffd. And at a lower level (page tables). I do not
want to relive that experience.
We are not doing this. It is for the benefit of everyone that we are
not doing this.
We need to find another way.
Regards,
Liam
Powered by blists - more mailing lists