[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aN1TgRpde5hq_FPn@google.com>
Date: Wed, 1 Oct 2025 09:15:32 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Vishal Annapurve <vannapurve@...gle.com>
Cc: Ackerley Tng <ackerleytng@...gle.com>, David Hildenbrand <david@...hat.com>,
Patrick Roy <patrick.roy@...ux.dev>, Fuad Tabba <tabba@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>,
Janosch Frank <frankja@...ux.ibm.com>, Claudio Imbrenda <imbrenda@...ux.ibm.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, Nikita Kalyazin <kalyazin@...zon.co.uk>, shivankg@....com
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject
user page faults if not set
On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > Oh! This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > KVM_CAP_GUEST_MEMFD_MMAP. Two things:
> >
> > 1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > that we don't need to add a capability every time a new flag comes along,
> > and so that userspace can gather all flags in a single ioctl. If gmem ever
> > supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > that's a non-issue relatively speaking.
> >
>
> Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> KVM_CAP_GUEST_MEMFD_CAPS.
I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.
> 2) IMO they should both support namespace of 64 values at least from the get go.
It's a limitation of KVM_CHECK_EXTENSION, and all of KVM's plumbing for ioctls.
Because KVM still supports 32-bit architectures, direct returns from ioctls are
forced to fit in 32-bit values to avoid unintentionally creating different ABI
for 32-bit vs. 64-bit kernels.
We could add KVM_CHECK_EXTENSION2 or KVM_CHECK_EXTENSION64 or something, but I
honestly don't see the point. The odds of guest_memfd supporting >32 flags is
small, and the odds of that happening in the next ~5 years is basically zero.
All so that userspace can make one syscall instead of two for a path that isn't
remotely performance critical.
So while I agree that being able to enumerate 64 flags from the get-go would be
nice to have, it's simply not worth the effort (unless someone has a clever idea).
> 3) The reservation scheme for upstream should ideally be LSB's first
> for the new caps/flags.
We're getting way ahead of ourselves. Nothing needs KVM_CAP_GUEST_MEMFD_CAPS at
this time, so there's nothing to discuss.
> guest_memfd will achieve multiple features in future, both upstream
> and in out-of-tree versions to deploy features before they make their
When it comes to upstream uAPI and uABI, out-of-tree kernel code is irrelevant.
> way upstream. Generally the scheme followed by out-of-tree versions is
> to define a custom UAPI that won't conflict with upstream UAPIs in
> near future. Having a namespace of 32 values gives little space to
> avoid the conflict, e.g. features like hugetlb support will have to
> eat up at least 5 bits from the flags [1].
Why on earth would out-of-tree code use KVM_CAP_GUEST_MEMFD_FLAGS? Providing
infrastructure to support an infinite (quite literally) number of out-of-tree
capabilities and sub-ioctls, with practically zero chance of conflict, is not
difficult. See internal b/378111418.
But as above, this is not upstream's problem to solve.
> [1] https://elixir.bootlin.com/linux/v6.17/source/include/uapi/asm-generic/hugetlb_encode.h#L20
Powered by blists - more mailing lists