[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+EHjTzdX8+MbsYOHAJn6Gkayfei-jE6Q_5HfZhnfwnMijmucw@mail.gmail.com>
Date: Mon, 29 Sep 2025 10:04:46 +0100
From: Fuad Tabba <tabba@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>,
Janosch Frank <frankja@...ux.ibm.com>, Claudio Imbrenda <imbrenda@...ux.ibm.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, David Hildenbrand <david@...hat.com>,
Ackerley Tng <ackerleytng@...gle.com>
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject
user page faults if not set
Hi Sean,
On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@...gle.com> wrote:
>
> Add a guest_memfd flag to allow userspace to state that the underlying
> memory should be configured to be shared by default, and reject user page
> faults if the guest_memfd instance's memory isn't shared by default.
> Because KVM doesn't yet support in-place private<=>shared conversions, all
> guest_memfd memory effectively follows the default state.
>
> Alternatively, KVM could deduce the default state based on MMAP, which for
> all intents and purposes is what KVM currently does. However, implicitly
> deriving the default state based on MMAP will result in a messy ABI when
> support for in-place conversions is added.
>
> For x86 CoCo VMs, which don't yet support MMAP, memory is currently private
> by default (otherwise the memory would be unusable). If MMAP implies
> memory is shared by default, then the default state for CoCo VMs will vary
> based on MMAP, and from userspace's perspective, will change when in-place
> conversion support is added. I.e. to maintain guest<=>host ABI, userspace
> would need to immediately convert all memory from shared=>private, which
> is both ugly and inefficient. The inefficiency could be avoided by adding
> a flag to state that memory is _private_ by default, irrespective of MMAP,
> but that would lead to an equally messy and hard to document ABI.
>
> Bite the bullet and immediately add a flag to control the default state so
> that the effective behavior is explicit and straightforward.
>
> Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
> Cc: David Hildenbrand <david@...hat.com>
> Cc: Fuad Tabba <tabba@...gle.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> Documentation/virt/kvm/api.rst | 10 ++++++++--
> include/uapi/linux/kvm.h | 3 ++-
> tools/testing/selftests/kvm/guest_memfd_test.c | 5 +++--
> virt/kvm/guest_memfd.c | 6 +++++-
> 4 files changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index c17a87a0a5ac..4dfe156bbe3c 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6415,8 +6415,14 @@ guest_memfd range is not allowed (any number of memory regions can be bound to
> a single guest_memfd file, but the bound ranges must not overlap).
>
> When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
> -supports GUEST_MEMFD_FLAG_MMAP. Setting this flag on guest_memfd creation
> -enables mmap() and faulting of guest_memfd memory to host userspace.
> +supports GUEST_MEMFD_FLAG_MMAP and GUEST_MEMFD_FLAG_DEFAULT_SHARED. Setting
There's an extra space between `and` and `GUEST_MEMFD_FLAG_DEFAULT_SHARED`.
> +the MMAP flag on guest_memfd creation enables mmap() and faulting of guest_memfd
> +memory to host userspace (so long as the memory is currently shared). Setting
> +DEFAULT_SHARED makes all guest_memfd memory shared by default (versus private
> +by default). Note! Because KVM doesn't yet support in-place private<=>shared
> +conversions, DEFAULT_SHARED must be specified in order to fault memory into
> +userspace page tables. This limitation will go away when in-place conversions
> +are supported.
I think that a more accurate (and future proof) description of the
mmap flag could be something along the lines of:
+ Setting GUEST_MEMFD_FLAG_MMAP enables using mmap() on the file descriptor.
+ Setting GUEST_MEMFD_FLAG_DEFAULT_SHARED makes all memory in the file shared
+ by default, as opposed to private. Shared memory can be faulted into host
+ userspace page tables. Private memory cannot.
> When the KVM MMU performs a PFN lookup to service a guest fault and the backing
> guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6efa98a57ec1..38a2c083b6aa 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1599,7 +1599,8 @@ struct kvm_memory_attributes {
> #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
>
> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
> -#define GUEST_MEMFD_FLAG_MMAP (1ULL << 0)
> +#define GUEST_MEMFD_FLAG_MMAP (1ULL << 0)
> +#define GUEST_MEMFD_FLAG_DEFAULT_SHARED (1ULL << 1)
>
> struct kvm_create_guest_memfd {
> __u64 size;
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index b3ca6737f304..81b11a958c7a 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -274,7 +274,7 @@ static void test_guest_memfd(unsigned long vm_type)
> vm = vm_create_barebones_type(vm_type);
>
> if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
> - flags |= GUEST_MEMFD_FLAG_MMAP;
> + flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>
> test_create_guest_memfd_multiple(vm);
> test_create_guest_memfd_invalid_sizes(vm, flags, page_size);
> @@ -337,7 +337,8 @@ static void test_guest_memfd_guest(void)
> "Default VM type should always support guest_memfd mmap()");
>
> size = vm->page_size;
> - fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP);
> + fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP |
> + GUEST_MEMFD_FLAG_DEFAULT_SHARED);
> vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL, fd, 0);
>
> mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 08a6bc7d25b6..19f05a45be04 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -328,6 +328,9 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
> if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
> return VM_FAULT_SIGBUS;
>
> + if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_DEFAULT_SHARED))
> + return VM_FAULT_SIGBUS;
> +
> folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> if (IS_ERR(folio)) {
> int err = PTR_ERR(folio);
> @@ -525,7 +528,8 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
> u64 valid_flags = 0;
>
> if (kvm_arch_supports_gmem_mmap(kvm))
> - valid_flags |= GUEST_MEMFD_FLAG_MMAP;
> + valid_flags |= GUEST_MEMFD_FLAG_MMAP |
> + GUEST_MEMFD_FLAG_DEFAULT_SHARED;
At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and
GUEST_MEMFD_FLAG_MMAP don't make sense without each other. Is it worth
checking for that, at least until we have in-place conversion? Having
only GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP,
isn't a useful combination.
That said, these are all nits, I'll leave it to you. With that:
Reviewed-by: Fuad Tabba <tabba@...gle.com>
Tested-by: Fuad Tabba <tabba@...gle.com>
Cheers,
/fuad
>
> if (flags & ~valid_flags)
> return -EINVAL;
> --
> 2.51.0.536.g15c5d4f767-goog
>
Powered by blists - more mailing lists