[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aNq5y1BSWsjculYM@google.com>
Date: Mon, 29 Sep 2025 09:54:35 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Ackerley Tng <ackerleytng@...gle.com>
Cc: Fuad Tabba <tabba@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>, Janosch Frank <frankja@...ux.ibm.com>,
Claudio Imbrenda <imbrenda@...ux.ibm.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
David Hildenbrand <david@...hat.com>, roypat@...zon.co.uk,
Nikita Kalyazin <kalyazin@...zon.co.uk>
Subject: Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject
user page faults if not set
On Mon, Sep 29, 2025, Ackerley Tng wrote:
> Fuad Tabba <tabba@...gle.com> writes:
>
> > Hi Sean,
> >
> > On Fri, 26 Sept 2025 at 17:31, Sean Christopherson <seanjc@...gle.com> wrote:
> >>
> >> Add a guest_memfd flag to allow userspace to state that the underlying
> >> memory should be configured to be shared by default, and reject user page
> >> faults if the guest_memfd instance's memory isn't shared by default.
> >> Because KVM doesn't yet support in-place private<=>shared conversions, all
> >> guest_memfd memory effectively follows the default state.
> >>
> >> Alternatively, KVM could deduce the default state based on MMAP, which for
> >> all intents and purposes is what KVM currently does. However, implicitly
> >> deriving the default state based on MMAP will result in a messy ABI when
> >> support for in-place conversions is added.
> >>
> >> For x86 CoCo VMs, which don't yet support MMAP, memory is currently private
> >> by default (otherwise the memory would be unusable). If MMAP implies
> >> memory is shared by default, then the default state for CoCo VMs will vary
> >> based on MMAP, and from userspace's perspective, will change when in-place
> >> conversion support is added. I.e. to maintain guest<=>host ABI, userspace
> >> would need to immediately convert all memory from shared=>private, which
> >> is both ugly and inefficient. The inefficiency could be avoided by adding
> >> a flag to state that memory is _private_ by default, irrespective of MMAP,
> >> but that would lead to an equally messy and hard to document ABI.
> >>
> >> Bite the bullet and immediately add a flag to control the default state so
> >> that the effective behavior is explicit and straightforward.
> >>
>
> I like having this flag, but didn't propose this because I thought folks
> depending on the default being shared (Patrick/Nikita) might have their
> usage broken.
mmap() support hasn't landed upstream, so as far as the upstream kernel is
concerned, there is no userspace to break. Which is exactly why I want to land
this (or something like it) in 6.18, before GUEST_MEMFD_FLAG_MMAP is officially
released.
> >> Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
> >> Cc: David Hildenbrand <david@...hat.com>
> >> Cc: Fuad Tabba <tabba@...gle.com>
> >> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> >> ---
> >> Documentation/virt/kvm/api.rst | 10 ++++++++--
> >> include/uapi/linux/kvm.h | 3 ++-
> >> tools/testing/selftests/kvm/guest_memfd_test.c | 5 +++--
> >> virt/kvm/guest_memfd.c | 6 +++++-
> >> 4 files changed, 18 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >> index c17a87a0a5ac..4dfe156bbe3c 100644
> >> --- a/Documentation/virt/kvm/api.rst
> >> +++ b/Documentation/virt/kvm/api.rst
> >> @@ -6415,8 +6415,14 @@ guest_memfd range is not allowed (any number of memory regions can be bound to
> >> a single guest_memfd file, but the bound ranges must not overlap).
> >>
> >> When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
> >> -supports GUEST_MEMFD_FLAG_MMAP. Setting this flag on guest_memfd creation
> >> -enables mmap() and faulting of guest_memfd memory to host userspace.
> >> +supports GUEST_MEMFD_FLAG_MMAP and GUEST_MEMFD_FLAG_DEFAULT_SHARED. Setting
> >
> > There's an extra space between `and` and `GUEST_MEMFD_FLAG_DEFAULT_SHARED`.
> >
>
> +1 on this. Also, would you consider putting the concept of "at creation
> time" or "at initialization time" into the name of the flag?
Yah, GUEST_MEMFD_FLAG_INIT_SHARED?
> "Default" could be interpreted as "whenever a folio is allocated for
> this guest_memfd", the memory the folio represents is by default
> shared.
>
> What we want to represent is that when the guest_memfd is created,
> memory at all indices are initialized as shared.
>
> Looking a bit further, when conversion is supported, if this flag is not
> specified, then all the indices are initialized as private, right?
Correct, which is the current (pre-6.18) behavior.
> >> +the MMAP flag on guest_memfd creation enables mmap() and faulting of guest_memfd
> >> +memory to host userspace (so long as the memory is currently shared). Setting
> >> +DEFAULT_SHARED makes all guest_memfd memory shared by default (versus private
> >> +by default). Note! Because KVM doesn't yet support in-place private<=>shared
> >> +conversions, DEFAULT_SHARED must be specified in order to fault memory into
> >> +userspace page tables. This limitation will go away when in-place conversions
> >> +are supported.
> >
> > I think that a more accurate (and future proof) description of the
> > mmap flag could be something along the lines of:
> >
>
> +1 on these suggestions, I agree that making the concepts of SHARED vs
> MMAP orthogonal from the start is more future proof.
>
> > + Setting GUEST_MEMFD_FLAG_MMAP enables using mmap() on the file descriptor.
> >
> > + Setting GUEST_MEMFD_FLAG_DEFAULT_SHARED makes all memory in the file shared
> > + by default
>
> See above, I'd prefer clarifying this as "at initialization time" or
> something similar.
Roger that.
> > At least for now, GUEST_MEMFD_FLAG_DEFAULT_SHARED and GUEST_MEMFD_FLAG_MMAP
> > don't make sense without each other. Is it worth checking for that, at
> > least until we have in-place conversion? Having only
> > GUEST_MEMFD_FLAG_DEFAULT_SHARED set, but GUEST_MEMFD_FLAG_MMAP, isn't a
> > useful combination.
Heh, that's exactly how I coded things up to start:
/*
* TODO: Drop the restriction that memory must be shared by default
* once in-place conversions are supported.
*/
if (flags & GUEST_MEMFD_FLAG_MMAP &&
!(flags & GUEST_MEMFD_FLAG_DEFAULT_SHARED))
return -EINVAL;
but if we go that route, then dropping the restriction would result in an ABI
change for non-CoCo VMs. The odds of such an ABI changes breaking userspace are
basically zero, but I couldn't think of any reason to risk it; userspace would
need to specify MMAP+SHARED either way.
And on the flip side, not enforcing the flags at the time of creation allows us
to test that user page faults to private memory are rejected. It's not a ton of
meaningful coverage, but it's not nothing either. And from a code perspective,
the diffs when in-place conversions are added are quite nice, as the concepts
don't change (user faults to private memory are disallowed), only the mechanics
change, i.e. the diffs highlight what all needs to happen to support conversions
without the extra noise of a change in overall semantics.
Powered by blists - more mailing lists