[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aN8U2c8KMXTy6h9Q@google.com>
Date: Thu, 2 Oct 2025 17:12:09 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Vishal Annapurve <vannapurve@...gle.com>
Cc: Ackerley Tng <ackerleytng@...gle.com>, David Hildenbrand <david@...hat.com>,
Patrick Roy <patrick.roy@...ux.dev>, Fuad Tabba <tabba@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>,
Janosch Frank <frankja@...ux.ibm.com>, Claudio Imbrenda <imbrenda@...ux.ibm.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, Nikita Kalyazin <kalyazin@...zon.co.uk>, shivankg@....com
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject
user page faults if not set
On Thu, Oct 02, 2025, Vishal Annapurve wrote:
> On Wed, Oct 1, 2025 at 5:04 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > On Wed, Oct 1, 2025 at 10:16 AM Sean Christopherson <seanjc@...gle.com> wrote:
> > > >
> > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@...gle.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > > > > > > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@...gle.com> wrote:
> > > > > > > >
> > > > > > > > Oh! This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > > > > > > KVM_CAP_GUEST_MEMFD_MMAP. Two things:
> > > > > > > >
> > > > > > > > 1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > > > > > > > that we don't need to add a capability every time a new flag comes along,
> > > > > > > > and so that userspace can gather all flags in a single ioctl. If gmem ever
> > > > > > > > supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > > > > > > > that's a non-issue relatively speaking.
> > > > > > > >
> > > > > > >
> > > > > > > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > > > > > > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > > > > > > KVM_CAP_GUEST_MEMFD_CAPS.
> > > > > >
> > > > > > I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> > > > > > saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> > > > > > KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.
> > > > >
> > > > > Ah, ok. Then do you envision the guest_memfd caps to still be separate
> > > > > KVM caps per guest_memfd feature?
> > > >
> > > > Yes? No? It depends on the feature and the actual implementation. E.g.
> > > > KVM_CAP_IRQCHIP enumerates support for a whole pile of ioctls.
> > >
> > > I think I am confused. Is the proposal here as follows?
> > > * Use KVM_CAP_GUEST_MEMFD_FLAGS for features that map to guest_memfd
> > > creation flags.
> >
> > No, the proposal is to use KVM_CAP_GUEST_MEMFD_FLAGS to enumerate the set of
> > supported KVM_CREATE_GUEST_MEMFD flags. Whether or not there is an associated
> > "feature" is irrelevant. I.e. it's a very literal "these are the supported
> > flags".
> >
> > > * Use KVM caps for guest_memfd features that don't map to any flags.
> > >
> > > I think in general it would be better to have a KVM cap for each
> > > feature irrespective of the flags as the feature may also need
> > ^^^
> > > additional UAPIs like IOCTLs.
> >
> > If the _only_ user-visible asset that is added is a KVM_CREATE_GUEST_MEMFD flag,
> > a CAP is gross overkill. Even if there are other assets that accompany the new
> > flag, there's no reason we couldn't say "this feature exist if XYZ flag is
> > supported".
> >
> > E.g. it's functionally no different than KVM_CAP_VM_TYPES reporting support for
> > KVM_X86_TDX_VM also effectively reporting support for a _huge_ number of things
> > far beyond being able to create a VM of type KVM_X86_TDX_VM.
> >
>
> What's your opinion about having KVM_CAP_GUEST_MEMFD_MMAP part of
> KVM_CAP_GUEST_MEMFD_CAPS i.e. having a KVM cap covering all features
> of guest_memfd?
I'd much prefer to have both. Describing flags for an ioctl via a bitmask that
doesn't *exactly* match the flags is asking for problems. At best, it will be
confusing. E.g. we'll probably end up with code like this:
gmem_caps = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);
if (gmem_caps & KVM_CAP_GUEST_MEMFD_MMAP)
gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
if (gmem_caps & KVM_CAP_GUEST_MEMFD_INIT_SHARED)
gmem_flags |= KVM_CAP_GUEST_MEMFD_INIT_SHARED;
Those types of patterns often lead to typos causing problems (LOL, case in point,
there's a typo above; I'm leaving it to illustrate my point). That can be largely
solved by userspace via macro shenanigans, but userspace really shouldn't have to
jump through hoops for such a simple thing.
An ever worse outcome is if userspace does something like:
gmem_flags = kvm_check_cap(KVM_CAP_GUEST_MEMFD_CAPS);
Which might actually work initially, e.g. if KVM_CAP_GUEST_MEMFD_MMAP and
GUEST_MEMFD_FLAG_MMAP have the same value. But eventually userspace will be sad.
Another issue is that, while unlikely, we could run out of KVM_CAP_GUEST_MEMFD_CAPS
bits before we run out of flags.
And if we use memory attributes, we're also guaranteed to have at least one gmem
capability that returns a bitmask separately from a dedicated one-size-fits-all
cap, e.g.
case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
if (vm_memory_attributes)
return 0;
return kvm_supported_mem_attributes(kvm);
Side topic, looking at this, I don't think we need KVM_CAP_GUEST_MEMFD_CAPS, I'm
pretty sure we can simply extend KVM_CAP_GUEST_MEMFD. E.g.
#define KVM_GUEST_MEMFD_FEAT_BASIC (1ULL << 0)
#define KVM_GUEST_MEMFD_FEAT_FANCY (1ULL << 1)
case KVM_CAP_GUEST_MEMFD:
return KVM_GUEST_MEMFD_FEAT_BASIC |
KVM_GUEST_MEMFD_FEAT_FANCY;
> That seems more consistent to me in order for userspace to deduce the
> supported features and assume flags/ioctls/... associated with the feature
> as a group.
If we add a feature that comes with a flag, we could always add both, i.e. a
feature flag for KVM_CAP_GUEST_MEMFD along with the natural enumeration for
KVM_CAP_GUEST_MEMFD_FLAGS. That certainly wouldn't be my first choice, but it's
a possibility, e.g. if it really is the most intuitive solution. But that's
getting quite hypothetical.
Powered by blists - more mailing lists