lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <diqz7bxh386h.fsf@google.com>
Date: Mon, 29 Sep 2025 09:43:50 +0000
From: Ackerley Tng <ackerleytng@...gle.com>
To: Fuad Tabba <tabba@...gle.com>, Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>, 
	Janosch Frank <frankja@...ux.ibm.com>, Claudio Imbrenda <imbrenda@...ux.ibm.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, David Hildenbrand <david@...hat.com>, roypat@...zon.co.uk, 
	"Kalyazin, Nikita" <kalyazin@...zon.co.uk>
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject
 user page faults if not set

Fuad Tabba <tabba@...gle.com> writes:

> Hi Sean,
>
> On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@...gle.com> wrote:
>>
>> Add a guest_memfd flag to allow userspace to state that the underlying
>> memory should be configured to be shared by default, and reject user page
>> faults if the guest_memfd instance's memory isn't shared by default.
>> Because KVM doesn't yet support in-place private<=>shared conversions, all
>> guest_memfd memory effectively follows the default state.
>>
>> Alternatively, KVM could deduce the default state based on MMAP, which for
>> all intents and purposes is what KVM currently does.  However, implicitly
>> deriving the default state based on MMAP will result in a messy ABI when
>> support for in-place conversions is added.
>>
>> For x86 CoCo VMs, which don't yet support MMAP, memory is currently private
>> by default (otherwise the memory would be unusable).  If MMAP implies
>> memory is shared by default, then the default state for CoCo VMs will vary
>> based on MMAP, and from userspace's perspective, will change when in-place
>> conversion support is added.  I.e. to maintain guest<=>host ABI, userspace
>> would need to immediately convert all memory from shared=>private, which
>> is both ugly and inefficient.  The inefficiency could be avoided by adding
>> a flag to state that memory is _private_ by default, irrespective of MMAP,
>> but that would lead to an equally messy and hard to document ABI.
>>
>> Bite the bullet and immediately add a flag to control the default state so
>> that the effective behavior is explicit and straightforward.
>>

I like having this flag, but didn't propose this because I thought folks
depending on the default being shared (Patrick/Nikita) might have their
usage broken.

>> Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
>> Cc: David Hildenbrand <david@...hat.com>
>> Cc: Fuad Tabba <tabba@...gle.com>
>> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
>> ---
>>  Documentation/virt/kvm/api.rst                 | 10 ++++++++--
>>  include/uapi/linux/kvm.h                       |  3 ++-
>>  tools/testing/selftests/kvm/guest_memfd_test.c |  5 +++--
>>  virt/kvm/guest_memfd.c                         |  6 +++++-
>>  4 files changed, 18 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index c17a87a0a5ac..4dfe156bbe3c 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -6415,8 +6415,14 @@ guest_memfd range is not allowed (any number of memory regions can be bound to
>>  a single guest_memfd file, but the bound ranges must not overlap).
>>
>>  When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
>> -supports GUEST_MEMFD_FLAG_MMAP.  Setting this flag on guest_memfd creation
>> -enables mmap() and faulting of guest_memfd memory to host userspace.
>> +supports GUEST_MEMFD_FLAG_MMAP and  GUEST_MEMFD_FLAG_DEFAULT_SHARED.  Setting
>
> There's an extra space between `and` and `GUEST_MEMFD_FLAG_DEFAULT_SHARED`.
>

+1 on this. Also, would you consider putting the concept of "at creation
time" or "at initialization time" into the name of the flag?

"Default" could be interpreted as "whenever a folio is allocated for
this guest_memfd", the memory the folio represents is by default
shared.

What we want to represent is that when the guest_memfd is created,
memory at all indices are initialized as shared.

Looking a bit further, when conversion is supported, if this flag is not
specified, then all the indices are initialized as private, right?

>> +the MMAP flag on guest_memfd creation enables mmap() and faulting of guest_memfd
>> +memory to host userspace (so long as the memory is currently shared).  Setting
>> +DEFAULT_SHARED makes all guest_memfd memory shared by default (versus private
>> +by default).  Note!  Because KVM doesn't yet support in-place private<=>shared
>> +conversions, DEFAULT_SHARED must be specified in order to fault memory into
>> +userspace page tables.  This limitation will go away when in-place conversions
>> +are supported.
>
> I think that a more accurate (and future proof) description of the
> mmap flag could be something along the lines of:
>

+1 on these suggestions, I agree that making the concepts of SHARED vs
MMAP orthogonal from the start is more future proof.

> + Setting GUEST_MEMFD_FLAG_MMAP enables using mmap() on the file descriptor.
>
> + Setting GUEST_MEMFD_FLAG_DEFAULT_SHARED makes all memory in the file shared
> + by default

See above, I'd prefer clarifying this as "at initialization time" or
something similar.

> , as opposed to private. Shared memory can be faulted into host
> + userspace page tables. Private memory cannot.
>
>>  When the KVM MMU performs a PFN lookup to service a guest fault and the backing
>>  guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 6efa98a57ec1..38a2c083b6aa 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1599,7 +1599,8 @@ struct kvm_memory_attributes {
>>  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>>
>>  #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
>> -#define GUEST_MEMFD_FLAG_MMAP  (1ULL << 0)
>> +#define GUEST_MEMFD_FLAG_MMAP          (1ULL << 0)
>> +#define GUEST_MEMFD_FLAG_DEFAULT_SHARED        (1ULL << 1)
>>
>>  struct kvm_create_guest_memfd {
>>         __u64 size;
>> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
>> index b3ca6737f304..81b11a958c7a 100644
>> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
>> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
>> @@ -274,7 +274,7 @@ static void test_guest_memfd(unsigned long vm_type)
>>         vm = vm_create_barebones_type(vm_type);
>>
>>         if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
>> -               flags |= GUEST_MEMFD_FLAG_MMAP;
>> +               flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>>
>>         test_create_guest_memfd_multiple(vm);
>>         test_create_guest_memfd_invalid_sizes(vm, flags, page_size);
>> @@ -337,7 +337,8 @@ static void test_guest_memfd_guest(void)
>>                     "Default VM type should always support guest_memfd mmap()");
>>
>>         size = vm->page_size;
>> -       fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP);
>> +       fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP |
>> +                                            GUEST_MEMFD_FLAG_DEFAULT_SHARED);
>>         vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL, fd, 0);
>>
>>         mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index 08a6bc7d25b6..19f05a45be04 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -328,6 +328,9 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>>         if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
>>                 return VM_FAULT_SIGBUS;
>>
>> +       if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_DEFAULT_SHARED))
>> +               return VM_FAULT_SIGBUS;
>> +
>>         folio = kvm_gmem_get_folio(inode, vmf->pgoff);
>>         if (IS_ERR(folio)) {
>>                 int err = PTR_ERR(folio);
>> @@ -525,7 +528,8 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>>         u64 valid_flags = 0;
>>
>>         if (kvm_arch_supports_gmem_mmap(kvm))
>> -               valid_flags |= GUEST_MEMFD_FLAG_MMAP;
>> +               valid_flags |= GUEST_MEMFD_FLAG_MMAP |
>> +                              GUEST_MEMFD_FLAG_DEFAULT_SHARED;
>
> At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and
> GUEST_MEMFD_FLAG_MMAP don't make sense without each other. Is it worth
> checking for that, at least until we have in-place conversion? Having
> only GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP,
> isn't a useful combination.
>

I think it's okay to have the two flags be orthogonal from the start.

Reviewed-by: Ackerley Tng <ackerleytng@...gle.com>

> That said, these are all nits, I'll leave it to you. With that:
>
> Reviewed-by: Fuad Tabba <tabba@...gle.com>
> Tested-by: Fuad Tabba <tabba@...gle.com>
>
> Cheers,
> /fuad
>
>
>
>>
>>         if (flags & ~valid_flags)
>>                 return -EINVAL;
>> --
>> 2.51.0.536.g15c5d4f767-goog
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ