lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aN3BhKZkCC4-iphM@google.com>
Date: Wed, 1 Oct 2025 17:04:20 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Vishal Annapurve <vannapurve@...gle.com>
Cc: Ackerley Tng <ackerleytng@...gle.com>, David Hildenbrand <david@...hat.com>, 
	Patrick Roy <patrick.roy@...ux.dev>, Fuad Tabba <tabba@...gle.com>, 
	Paolo Bonzini <pbonzini@...hat.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>, 
	Janosch Frank <frankja@...ux.ibm.com>, Claudio Imbrenda <imbrenda@...ux.ibm.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Nikita Kalyazin <kalyazin@...zon.co.uk>, shivankg@....com
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject
 user page faults if not set

On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> On Wed, Oct 1, 2025 at 10:16 AM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@...gle.com> wrote:
> > > >
> > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@...gle.com> wrote:
> > > > > >
> > > > > > Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > > > > KVM_CAP_GUEST_MEMFD_MMAP.  Two things:
> > > > > >
> > > > > >  1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > > > > >     that we don't need to add a capability every time a new flag comes along,
> > > > > >     and so that userspace can gather all flags in a single ioctl.  If gmem ever
> > > > > >     supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > > > > >     that's a non-issue relatively speaking.
> > > > > >
> > > > >
> > > > > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > > > > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > > > > KVM_CAP_GUEST_MEMFD_CAPS.
> > > >
> > > > I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> > > > saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> > > > KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.
> > >
> > > Ah, ok. Then do you envision the guest_memfd caps to still be separate
> > > KVM caps per guest_memfd feature?
> >
> > Yes?  No?  It depends on the feature and the actual implementation.  E.g.
> > KVM_CAP_IRQCHIP enumerates support for a whole pile of ioctls.
> 
> I think I am confused. Is the proposal here as follows?
> * Use KVM_CAP_GUEST_MEMFD_FLAGS for features that map to guest_memfd
> creation flags.

No, the proposal is to use KVM_CAP_GUEST_MEMFD_FLAGS to enumerate the set of
supported KVM_CREATE_GUEST_MEMFD flags.  Whether or not there is an associated
"feature" is irrelevant.  I.e. it's a very literal "these are the supported
flags".

> * Use KVM caps for guest_memfd features that don't map to any flags.
> 
> I think in general it would be better to have a KVM cap for each
> feature irrespective of the flags as the feature may also need
                                                   ^^^
> additional UAPIs like IOCTLs.

If the _only_ user-visible asset that is added is a KVM_CREATE_GUEST_MEMFD flag,
a CAP is gross overkill.  Even if there are other assets that accompany the new
flag, there's no reason we couldn't say "this feature exist if XYZ flag is
supported".

E.g. it's functionally no different than KVM_CAP_VM_TYPES reporting support for
KVM_X86_TDX_VM also effectively reporting support for a _huge_ number of things
far beyond being able to create a VM of type KVM_X86_TDX_VM.

KVM_CAP_XEN_HVM is a big collection of flags that have very little in common other
than being for Xen emulation.

> I fail to see the benefits of KVM_CAP_GUEST_MEMFD_FLAGS over
> KVM_CAP_GUEST_MEMFD_MMAP:

Adding a new flag doesn't require all of the things that come along with a new
capability.  E.g. there's zero chance of collisions between maintainer sub-trees
(at least as far as capabilities go; if multiple maintainers are adding multiple
gmem flags in the same kernel release, I really hope they'd be coordinating).

Enumerating in userspace is also more natural, e.g. userspace doesn't have to
manually build the mask of valid flags.

Writing documentation should be much easier (much less boilerplate), e.g. the
sum total of uAPI for adding GUEST_MEMFD_FLAG_INIT_SHARED is:

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7ba92f2ced38..754b662a453c 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6438,6 +6438,11 @@ specified via KVM_CREATE_GUEST_MEMFD.  Currently defined flags:
   ============================ ================================================
   GUEST_MEMFD_FLAG_MMAP        Enable using mmap() on the guest_memfd file
                                descriptor.
+  GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
+                               KVM_CREATE_GUEST_MEMFD (memory files created
+                               without INIT_SHARED will be marked private).
+                               Shared memory can be faulted into host userspace
+                               page tables. Private memory cannot.
   ============================ ================================================
 
 When the KVM MMU performs a PFN lookup to service a guest fault and the backing
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b1d52d0c56ec..52f6000ab020 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1599,7 +1599,8 @@ struct kvm_memory_attributes {
 #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
 
 #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
-#define GUEST_MEMFD_FLAG_MMAP  (1ULL << 0)
+#define GUEST_MEMFD_FLAG_MMAP          (1ULL << 0)
+#define GUEST_MEMFD_FLAG_INIT_SHARED   (1ULL << 1)
 
 struct kvm_create_guest_memfd {
        __u64 size;

> 1) It limits the possible values to 32 even though we could pass 64 flags to
> the original ioctl.

So because we're currently limited to 32 flags, we should instead throw in the
towel and artificially limit ourselves to 1 flag (0 or 1)?  Because for all intents
and purposes, that's what we'd be doing.

Again, that is unlikely to be problematic before I retire.  It might not even be
a problem _ever_, because with luck we'll kill off 32-bit KVM in the next few
years and then we can actually leverage returning a "long" from ioctls.  Literally
every capability that returns a mask of flags has this "problem"; it's not notable
or even an issue in practice.

> 2) Userspace has to anyways assume the semantics of each bit position.

Not always.

	uint64_t valid_flags = vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS);
	uint64_t flag;
	int fd;

	for (flag = BIT(0); flag; flag <<= 1) {
		fd = __vm_create_guest_memfd(vm, page_size, flag);
		if (flag & valid_flags) {
			TEST_ASSERT(fd >= 0,
				    "guest_memfd() with flag '0x%lx' should succeed",
				    flag);
			close(fd);
		} else {
			TEST_ASSERT(fd < 0 && errno == EINVAL,
				    "guest_memfd() with flag '0x%lx' should fail with EINVAL",
				    flag);
		}
	}

But pedantry aside, I don't see how this is at all an interesting point.  Yes,
userspace has to know how to use a feature.

> 3) Userspace still has to check for caps for features that carry extra
> UAPI baggage.

That's simply not true.  E.g. see the example with VM types.
 
> KVM_CAP_GUEST_MEMFD_MMAP allows userspace to assume that mmap is
> supported and userspace can just pass in the mmap flag that it anyways
> has to assume.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ