lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <652578cc-eeff-4996-8c80-e26682a57e6d@amazon.com>
Date: Mon, 1 Dec 2025 13:39:38 +0000
From: Nikita Kalyazin <kalyazin@...zon.com>
To: Mike Rapoport <rppt@...nel.org>, <linux-mm@...ck.org>
CC: Andrea Arcangeli <aarcange@...hat.com>, Andrew Morton
	<akpm@...ux-foundation.org>, Axel Rasmussen <axelrasmussen@...gle.com>,
	Baolin Wang <baolin.wang@...ux.alibaba.com>, David Hildenbrand
	<david@...hat.com>, Hugh Dickins <hughd@...gle.com>, James Houghton
	<jthoughton@...gle.com>, "Liam R. Howlett" <Liam.Howlett@...cle.com>,
	"Lorenzo Stoakes" <lorenzo.stoakes@...cle.com>, Michal Hocko
	<mhocko@...e.com>, "Paolo Bonzini" <pbonzini@...hat.com>, Peter Xu
	<peterx@...hat.com>, "Sean Christopherson" <seanjc@...gle.com>, Shuah Khan
	<shuah@...nel.org>, "Suren Baghdasaryan" <surenb@...gle.com>, Vlastimil Babka
	<vbabka@...e.cz>, <linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>,
	<linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH v3 4/5] guest_memfd: add support for userfaultfd minor
 mode



On 30/11/2025 11:18, Mike Rapoport wrote:
> From: "Mike Rapoport (Microsoft)" <rppt@...nel.org>
> 
> userfaultfd notifications about minor page faults used for live migration
> and snapshotting of VMs with memory backed by shared hugetlbfs or tmpfs
> mappings as described in detail in commit 7677f7fd8be7 ("userfaultfd: add
> minor fault registration mode").
> 
> To use the same mechanism for VMs that use guest_memfd to map their memory,
> guest_memfd should support userfaultfd minor mode.
> 
> Extend ->fault() method of guest_memfd with ability to notify core page
> fault handler that a page fault requires handle_userfault(VM_UFFD_MINOR) to
> complete and add implementation of ->get_folio_noalloc() to guest_memfd
> vm_ops.
> 
> Reviewed-by: Liam R. Howlett <Liam.Howlett@...cle.com>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@...nel.org>
> ---
>   virt/kvm/guest_memfd.c | 33 ++++++++++++++++++++++++++++++++-
>   1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index ffadc5ee8e04..dca6e373937b 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -4,6 +4,7 @@
>   #include <linux/kvm_host.h>
>   #include <linux/pagemap.h>
>   #include <linux/anon_inodes.h>
> +#include <linux/userfaultfd_k.h>
> 
>   #include "kvm_mm.h"
> 
> @@ -359,7 +360,15 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>          if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_INIT_SHARED))
>                  return VM_FAULT_SIGBUS;
> 
> -       folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> +       folio = filemap_lock_folio(inode->i_mapping, vmf->pgoff);
> +       if (!IS_ERR_OR_NULL(folio) && userfaultfd_minor(vmf->vma)) {
> +               ret = VM_FAULT_UFFD_MINOR;
> +               goto out_folio;
> +       }

I realised that I might have been wrong in [1] saying that the noalloc 
get folio was ok for our use case.  Unfortunately we rely on a minor 
fault to get generated even when the page is being allocated.  Peter and 
I discussed it originally in [2].  Since we want to populate guest 
memory with the content supplied by userspace on demand, we have to be 
able to intercept the very first access, meaning we either need a minor 
or major UFFD event for that.  We decided to make use of the minor at 
the time.  If we have to preserve the shmem semantics, it forces us to 
implement support for major/UFFDIO_COPY.

[1] 
https://lore.kernel.org/all/4405c306-9d7c-4fd6-9ea6-2ed1b73f5c2e@amazon.com
[2] https://lore.kernel.org/kvm/Z9HhTjEWtM58Zfxf@x1.local

> +
> +       if (PTR_ERR(folio) == -ENOENT)
> +               folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> +
>          if (IS_ERR(folio)) {
>                  int err = PTR_ERR(folio);
> 
> @@ -390,8 +399,30 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>          return ret;
>   }
> 
> +#ifdef CONFIG_USERFAULTFD
> +static struct folio *kvm_gmem_get_folio_noalloc(struct inode *inode,
> +                                               pgoff_t pgoff)
> +{
> +       struct folio *folio;
> +
> +       folio = filemap_lock_folio(inode->i_mapping, pgoff);
> +       if (IS_ERR_OR_NULL(folio))
> +               return folio;
> +
> +       if (!folio_test_uptodate(folio)) {
> +               clear_highpage(folio_page(folio, 0));
> +               kvm_gmem_mark_prepared(folio);
> +       }
> +
> +       return folio;
> +}
> +#endif
> +
>   static const struct vm_operations_struct kvm_gmem_vm_ops = {
>          .fault = kvm_gmem_fault_user_mapping,
> +#ifdef CONFIG_USERFAULTFD
> +       .get_folio_noalloc      = kvm_gmem_get_folio_noalloc,
> +#endif
>   };
> 
>   static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> --
> 2.51.0
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ