linux-kernel - Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aN1TgRpde5hq_FPn@google.com>
Date: Wed, 1 Oct 2025 09:15:32 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Vishal Annapurve <vannapurve@...gle.com>
Cc: Ackerley Tng <ackerleytng@...gle.com>, David Hildenbrand <david@...hat.com>, 
	Patrick Roy <patrick.roy@...ux.dev>, Fuad Tabba <tabba@...gle.com>, 
	Paolo Bonzini <pbonzini@...hat.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>, 
	Janosch Frank <frankja@...ux.ibm.com>, Claudio Imbrenda <imbrenda@...ux.ibm.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Nikita Kalyazin <kalyazin@...zon.co.uk>, shivankg@....com
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject
 user page faults if not set

On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > Oh!  This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > KVM_CAP_GUEST_MEMFD_MMAP.  Two things:
> >
> >  1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> >     that we don't need to add a capability every time a new flag comes along,
> >     and so that userspace can gather all flags in a single ioctl.  If gmem ever
> >     supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> >     that's a non-issue relatively speaking.
> >
> 
> Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> KVM_CAP_GUEST_MEMFD_CAPS.

I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.

> 2) IMO they should both support namespace of 64 values at least from the get go.

It's a limitation of KVM_CHECK_EXTENSION, and all of KVM's plumbing for ioctls.
Because KVM still supports 32-bit architectures, direct returns from ioctls are
forced to fit in 32-bit values to avoid unintentionally creating different ABI
for 32-bit vs. 64-bit kernels.

We could add KVM_CHECK_EXTENSION2 or KVM_CHECK_EXTENSION64 or something, but I
honestly don't see the point.  The odds of guest_memfd supporting >32 flags is
small, and the odds of that happening in the next ~5 years is basically zero.
All so that userspace can make one syscall instead of two for a path that isn't
remotely performance critical.

So while I agree that being able to enumerate 64 flags from the get-go would be
nice to have, it's simply not worth the effort (unless someone has a clever idea).

> 3) The reservation scheme for upstream should ideally be LSB's first
> for the new caps/flags.

We're getting way ahead of ourselves.  Nothing needs KVM_CAP_GUEST_MEMFD_CAPS at
this time, so there's nothing to discuss.

> guest_memfd will achieve multiple features in future, both upstream
> and in out-of-tree versions to deploy features before they make their

When it comes to upstream uAPI and uABI, out-of-tree kernel code is irrelevant.

> way upstream. Generally the scheme followed by out-of-tree versions is
> to define a custom UAPI that won't conflict with upstream UAPIs in
> near future. Having a namespace of 32 values gives little space to
> avoid the conflict, e.g. features like hugetlb support will have to
> eat up at least 5 bits from the flags [1].

Why on earth would out-of-tree code use KVM_CAP_GUEST_MEMFD_FLAGS?   Providing
infrastructure to support an infinite (quite literally) number of out-of-tree
capabilities and sub-ioctls, with practically zero chance of conflict, is not
difficult.  See internal b/378111418.

But as above, this is not upstream's problem to solve.

> [1] https://elixir.bootlin.com/linux/v6.17/source/include/uapi/asm-generic/hugetlb_encode.h#L20