lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <jxekhkhbn7uk23oe24svxrs3vfuhseae57sqagndqgh2e7h33o@kfkygmrvjb5n>
Date: Mon, 8 Sep 2025 12:53:37 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Nikita Kalyazin <kalyazin@...zon.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Peter Xu <peterx@...hat.com>,
        David Hildenbrand <david@...hat.com>, Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>,
        Muchun Song <muchun.song@...ux.dev>, Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        James Houghton <jthoughton@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Oscar Salvador <osalvador@...e.de>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Ujwal Kundur <ujwal.kundur@...il.com>
Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API

* Nikita Kalyazin <kalyazin@...zon.com> [250901 12:01]:
> 
> 
> On 04/07/2025 20:39, Liam R. Howlett wrote:
> > * Peter Xu <peterx@...hat.com> [250704 11:00]:
> > > On Fri, Jul 04, 2025 at 11:34:15AM +0200, David Hildenbrand wrote:
> > > > On 03.07.25 19:48, Mike Rapoport wrote:
> > > > > On Wed, Jul 02, 2025 at 03:46:57PM -0400, Peter Xu wrote:
> > > > > > On Wed, Jul 02, 2025 at 08:39:32PM +0300, Mike Rapoport wrote:
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > > The main target of this change is the implementation of UFFD for
> > > > > > > > KVM/guest_memfd (examples: [1], [2]) to avoid bringing KVM-specific code
> > > > > > > > into the mm codebase.  We usually mean KVM by the "drivers" in this context,
> > > > > > > > and it is already somewhat "knowledgeable" of the mm.  I don't think there
> > > > > > > > are existing use cases for other drivers to implement this at the moment.
> > > > > > > > 
> > > > > > > > Although I can't see new exports in this series, there is now a way to limit
> > > > > > > > exports to particular modules [3].  Would it help if we only do it for KVM
> > > > > > > > initially (if/when actually needed)?
> > > > > > > 
> > > > > > > There were talks about pulling out guest_memfd core into mm, but I don't
> > > > > > > remember patches about it. If parts of guest_memfd were already in mm/ that
> > > > > > > would make easier to export uffd ops to it.
> > > > > > 
> > > > > > Do we have a link to such discussion?  I'm also curious whether that idea
> > > > > > was acknowledged by KVM maintainers.
> > > > > 
> > > > > AFAIR it was discussed at one of David's guest_memfd calls
> > > > 
> > > > While it was discussed in the call a couple of times in different context
> > > > (guest_memfd as a library / guest_memfd shim), I think we already discussed
> > > > it back at LPC last year.
> > > > 
> > > > One of the main reasons for doing that is supporting guest_memfd in other
> > > > hypervisors -- the gunyah hypervisor in the kernel wants to make use of it
> > > > as well.
> > > 
> > > I see, thanks for the info. I found the series, it's here:
> > > 
> > > https://lore.kernel.org/all/20241113-guestmem-library-v3-0-71fdee85676b@quicinc.com/
> > > 
> > > Here, the question is whether do we still want to keep explicit calls to
> > > shmem, hugetlbfs and in the future, guest-memfd.  The library-ize of
> > > guest-memfd doesn't change a huge lot on answering this question, IIUC.
> > 
> > Can you explore moving hugetlb_mfill_atomic_pte and
> > shmem_mfill_atomic_pte into mm/userfaultfd.c and generalizing them to
> > use the same code?
> > 
> > That is, split the similar blocks into functions and reduce duplication.
> > 
> > These are under the UFFD config option and are pretty similar.  This
> > will also limit the bleeding of mfill_atomic_mode out of uffd.
> > 
> > 
> > 
> > If you look at what the code does in userfaultfd.c, you can see that
> > some refactoring is necessary for other reasons:
> > 
> > mfill_atomic() calls mfill_atomic_hugetlb(), or it enters a while
> > (src_addr < src_start + len) to call mfill_atomic_pte().. which might
> > call shmem_mfill_atomic_pte() in mm/shmem.c
> > 
> > mfill_atomic_hugetlb() calls, in a while (src_addr < src_start + len)
> > loop and calls hugetlb_mfill_atomic_pte() in mm/hugetlb.c
> > 
> > The shmem call already depends on the vma flags.. which it still does in
> > your patch 4 here.  So you've replaced this:
> > 
> > if (!(dst_vma->vm_flags & VM_SHARED)) {
> > ...
> > } else {
> >          shmem_mfill_atomic_pte()
> > }
> > 
> > With...
> > 
> > if (!(dst_vma->vm_flags & VM_SHARED)) {
> > ...
> > } else {
> > ...
> >          uffd_ops->uffd_copy()
> > }
> > 
> > So, really, what needs to happen first is userfaultfd needs to be
> > sorted.
> > 
> > There's no point of creating a vm_ops_uffd if it will just serve as
> > replacing the call locations of the functions like this, as it has done
> > nothing to simplify the logic.
> > 
> > > However if we want to generalize userfaultfd capability for a type of
> > > memory, we will still need something like the vm_uffd_ops hook to report
> > > such information.  It means drivers can still overwrite these, with/without
> > > an exported mfill_atomic_install_pte() functions.  I'm not sure whether
> > > that eases the concern.
> > 
> > If we work through the duplication and reduction where possible, the
> > path forward may be easier to see.
> > 
> > > 
> > > So to me, generalizing the mem type looks helpful with/without moving
> > > guest-memfd under mm/.
> > 
> > Yes, it should decrease the duplication across hugetlb.c and shmem.c,
> > but I think that userfaultfd is the place to start.
> > 
> > > 
> > > We do have the option to keep hard-code guest-memfd like shmem or
> > > hugetlbfs. This is still "doable", but this likely means guest-memfd
> > > support for userfaultfd needs to be done after that work.  I did quickly
> > > check the status of gunyah hypervisor [1,2,3], I found that all of the
> > > efforts are not yet continued in 2025.  The hypervisor last update was Jan
> > > 2024 with a batch push [1].
> > > 
> > > I still prefer generalizing uffd capabilities using the ops.  That makes
> > > guest-memfd support on MINOR not be blocked and it should be able to be
> > > done concurrently v.s. guest-memfd library.  If guest-memfd library idea
> > > didn't move on, it's non-issue either.
> > > 
> > > I've considered dropping uffd_copy() and MISSING support for vm_uffd_ops if
> > > I'm going to repost - that looks like the only thing that people are
> > > against with, even though that is not my preference, as that'll make the
> > > API half-broken on its own.
> > 
> > The generalisation you did does not generalize much, as I pointed out
> > above, and so it seems less impactful than it could be.
> > 
> > These patches also do not explore what this means for guest_memfd.  So
> > it is not clear that the expected behaviour will serve the need.
> > 
> > You sent a link to an example user.  Can you please keep this work
> > together in the patch set so that we know it'll work for your use case
> > and allows us an easier way to pull down this work so we can examine it.
> 
> Hi Liam, Lorenzo,
> 
> With mmap support in guest_memfd being recently accepted and merged into
> kvm/next [1], UFFDIO_CONTINUE support in guest_memfd becomes a real use
> case.
> 
> From what I understand, it is the API for UFFDIO_COPY (ie the .uffd_copy
> callback) proposed by Peter that raises the safery issues, while the
> generalisation of the checks (.uffd_features, .uffd_ioctls) and
> .uffd_get_folio needed for UFFDIO_CONTINUE do not introduce such concerns.
> In order to unblock the userfaultfd support in guest_memfd, would it be
> acceptable to start with implementing .uffd_get_folio/UFFDIO_CONTINUE only,
> leaving the callback for UFFDIO_COPY for later when we have an idea about a
> safer API and a clear use case for that?

Reading through the patches, I'm not entirely sure what you are
proposing.

What I was hoping to see by a generalization of the memory types is a
much simpler shared code base until the code hit memory type specific
areas where a function pointer could be used to keep things from getting
complicated (or, I guess a switch statement..).

What we don't want is non-mm code specifying values for the function
pointer and doing what they want, or a function pointer that returns a
core mm resource (in the old example this was a vma, here it is a
folio).

>From this patch set:
+        * Return: zero if succeeded, negative for errors.
+        */
+       int (*uffd_get_folio)(struct inode *inode, pgoff_t pgoff,
+                             struct folio **folio);

This is one of the contention points in the current scenario as the
folio would be returned.

If you are going to be manipulating an mm resource it should be in mm
code, especially if it requires mm locks.

Thanks,
Liam

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ