[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGtprH-0B+cDARbK-xPGfx4sva+F1akbkX1gXts2VHaqyDWdzA@mail.gmail.com>
Date: Wed, 1 Oct 2025 09:31:14 -0700
From: Vishal Annapurve <vannapurve@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Ackerley Tng <ackerleytng@...gle.com>, David Hildenbrand <david@...hat.com>,
Patrick Roy <patrick.roy@...ux.dev>, Fuad Tabba <tabba@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>,
Janosch Frank <frankja@...ux.ibm.com>, Claudio Imbrenda <imbrenda@...ux.ibm.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, Nikita Kalyazin <kalyazin@...zon.co.uk>, shivankg@....com
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject
user page faults if not set
On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@...gle.com> wrote:
> > >
> > > Oh! This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > KVM_CAP_GUEST_MEMFD_MMAP. Two things:
> > >
> > > 1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > > that we don't need to add a capability every time a new flag comes along,
> > > and so that userspace can gather all flags in a single ioctl. If gmem ever
> > > supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > > that's a non-issue relatively speaking.
> > >
> >
> > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > KVM_CAP_GUEST_MEMFD_CAPS.
>
> I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.
Ah, ok. Then do you envision the guest_memfd caps to still be separate
KVM caps per guest_memfd feature?
>
> > 2) IMO they should both support namespace of 64 values at least from the get go.
>
> It's a limitation of KVM_CHECK_EXTENSION, and all of KVM's plumbing for ioctls.
> Because KVM still supports 32-bit architectures, direct returns from ioctls are
> forced to fit in 32-bit values to avoid unintentionally creating different ABI
> for 32-bit vs. 64-bit kernels.
>
> We could add KVM_CHECK_EXTENSION2 or KVM_CHECK_EXTENSION64 or something, but I
> honestly don't see the point. The odds of guest_memfd supporting >32 flags is
> small, and the odds of that happening in the next ~5 years is basically zero.
> All so that userspace can make one syscall instead of two for a path that isn't
> remotely performance critical.
>
> So while I agree that being able to enumerate 64 flags from the get-go would be
> nice to have, it's simply not worth the effort (unless someone has a clever idea).
Ack.
>
> > 3) The reservation scheme for upstream should ideally be LSB's first
> > for the new caps/flags.
>
> We're getting way ahead of ourselves. Nothing needs KVM_CAP_GUEST_MEMFD_CAPS at
> this time, so there's nothing to discuss.
>
> > guest_memfd will achieve multiple features in future, both upstream
> > and in out-of-tree versions to deploy features before they make their
>
> When it comes to upstream uAPI and uABI, out-of-tree kernel code is irrelevant.
>
> > way upstream. Generally the scheme followed by out-of-tree versions is
> > to define a custom UAPI that won't conflict with upstream UAPIs in
> > near future. Having a namespace of 32 values gives little space to
> > avoid the conflict, e.g. features like hugetlb support will have to
> > eat up at least 5 bits from the flags [1].
>
> Why on earth would out-of-tree code use KVM_CAP_GUEST_MEMFD_FLAGS? Providing
I can imagine a scenario where KVM_CAP_GUEST_MEMFD_FLAGS is upstreamed
and more flags landing in KVM_CAP_GUEST_MEMFD_FLAGS as supported over
time afterwards. out-of-tree code may ingest KVM_CAP_GUEST_MEMFD_FLAGS
in between.
> infrastructure to support an infinite (quite literally) number of out-of-tree
> capabilities and sub-ioctls, with practically zero chance of conflict, is not
> difficult. See internal b/378111418.
>
> But as above, this is not upstream's problem to solve.
>
> > [1] https://elixir.bootlin.com/linux/v6.17/source/include/uapi/asm-generic/hugetlb_encode.h#L20
Powered by blists - more mailing lists