[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADrL8HUMm0PUqx-xNdPvSMP6z4gzs2OTUJG1sdyy88D-XWxT3g@mail.gmail.com>
Date: Wed, 28 May 2025 11:21:21 -0400
From: James Houghton <jthoughton@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Jonathan Corbet <corbet@....net>, Marc Zyngier <maz@...nel.org>,
Oliver Upton <oliver.upton@...ux.dev>, Yan Zhao <yan.y.zhao@...el.com>,
Nikita Kalyazin <kalyazin@...zon.com>, Anish Moorthy <amoorthy@...gle.com>,
Peter Gonda <pgonda@...gle.com>, Peter Xu <peterx@...hat.com>,
David Matlack <dmatlack@...gle.com>, wei.w.wang@...el.com, kvm@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev
Subject: Re: [PATCH v2 01/13] KVM: Add KVM_MEM_USERFAULT memslot flag and bitmap
On Tue, May 6, 2025 at 8:01 PM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Thu, Jan 09, 2025, James Houghton wrote:
> > Use one of the 14 reserved u64s in struct kvm_userspace_memory_region2
> > for the user to provide `userfault_bitmap`.
> >
> > The memslot flag indicates if KVM should be reading from the
> > `userfault_bitmap` field from the memslot. The user is permitted to
> > provide a bogus pointer. If the pointer cannot be read from, we will
> > return -EFAULT (with no other information) back to the user.
>
> For the uAPI+infrastructure changelog, please elaborate on the design goals and
> choices. The "what" is pretty obvious from the patch; describe why this is being
> added.
>
> > Signed-off-by: James Houghton <jthoughton@...gle.com>
> > ---
> > include/linux/kvm_host.h | 14 ++++++++++++++
> > include/uapi/linux/kvm.h | 4 +++-
> > virt/kvm/Kconfig | 3 +++
> > virt/kvm/kvm_main.c | 35 +++++++++++++++++++++++++++++++++++
> > 4 files changed, 55 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 401439bb21e3..f7a3dfd5e224 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -590,6 +590,7 @@ struct kvm_memory_slot {
> > unsigned long *dirty_bitmap;
> > struct kvm_arch_memory_slot arch;
> > unsigned long userspace_addr;
> > + unsigned long __user *userfault_bitmap;
> > u32 flags;
> > short id;
> > u16 as_id;
> > @@ -724,6 +725,11 @@ static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
> > }
> > #endif
> >
> > +static inline bool kvm_has_userfault(struct kvm *kvm)
> > +{
> > + return IS_ENABLED(CONFIG_HAVE_KVM_USERFAULT);
> > +}
>
> Eh, don't think we need this wrapper. Just check the CONFIG_xxx manually in the
> one or two places where code isn't guarded by an #ifdef.
>
> > struct kvm_memslots {
> > u64 generation;
> > atomic_long_t last_used_slot;
> > @@ -2553,4 +2559,12 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> > struct kvm_pre_fault_memory *range);
> > #endif
> >
> > +int kvm_gfn_userfault(struct kvm *kvm, struct kvm_memory_slot *memslot,
> > + gfn_t gfn);
> > +
> > +static inline bool kvm_memslot_userfault(struct kvm_memory_slot *memslot)
>
> I strongly prefer kvm_is_userfault_memslot(). KVM's weird kvm_memslot_<flag>()
> nomenclature comes from ancient code, i.e. isn't something I would follow.
>
> > +{
> > + return memslot->flags & KVM_MEM_USERFAULT;
>
> I think it's worth checking for a non-NULL memslot, even if all current callers
> pre-check for a slot.
>
> > @@ -2042,6 +2051,9 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > if (r)
> > goto out;
> > }
> > + if (mem->flags & KVM_MEM_USERFAULT)
> > + new->userfault_bitmap =
> > + (unsigned long __user *)(unsigned long)mem->userfault_bitmap;
>
> if (mem->flags & KVM_MEM_USERFAULT)
> new->userfault_bitmap = u64_to_user_ptr(mem->userfault_bitmap);
Applied this change to the other cast (where we do access_ok()) as well, thanks!
>
> > r = kvm_set_memslot(kvm, old, new, change);
> > if (r)
> > @@ -6426,3 +6438,26 @@ void kvm_exit(void)
> > kvm_irqfd_exit();
> > }
> > EXPORT_SYMBOL_GPL(kvm_exit);
> > +
> > +int kvm_gfn_userfault(struct kvm *kvm, struct kvm_memory_slot *memslot,
> > + gfn_t gfn)
>
> I think this series is the perfect opportunity (read: victim) to introduce a
> common "struct kvm_page_fault". With a common structure to provide the gfn, slot,
> write, exec, and is_private fields, this helper can handle the checks and the call
> to kvm_prepare_memory_fault_exit().
>
> And with that in place, I would vote to name this something like kvm_do_userfault(),
> return a boolean, and let the caller return -EFAULT.
Returning 'true' from kvm_do_userfault() without a
kvm_prepare_memory_fault_exit() looked a bit strange at first, but I
don't have strong feelings. I'll add a small comment there.
>
> For making "struct kvm_page_fault" common, one thought would be to have arch code
> define the entire struct, and simply assert on the few fields that common KVM needs
> being defined by arch code. And wrap all references in CONFIG_KVM_GENERIC_PAGE_FAULT.
>
> I don't expect there will be a huge number of fields that common KVM needs, i.e. I
> don't think the maintenance burden of punting to arch code will be high. And letting
> arch code own the entire struct means we don't need to have e.g. fault->arch.present
> vs. fault->write in KVM x86, which to me is a big net negative for readability.
>
> I'll respond to the cover letter with an attachment of seven patches to sketch out
> the idea.
Looks great! Thanks very much!
>
> > +{
> > + unsigned long bitmap_chunk = 0;
> > + off_t offset;
> > +
> > + if (!kvm_memslot_userfault(memslot))
> > + return 0;
> > +
> > + if (WARN_ON_ONCE(!memslot->userfault_bitmap))
> > + return 0;
>
> '0' is technically a valid userspace address. I'd just drop this. If we have a
> KVM bug that results in failure to generate usefaults, we'll notice quite quickly.
>
> > +
> > + offset = gfn - memslot->base_gfn;
> > +
> > + if (copy_from_user(&bitmap_chunk,
> > + memslot->userfault_bitmap + offset / BITS_PER_LONG,
> > + sizeof(bitmap_chunk)))
>
> Since the address is checked during memslot creation, I'm pretty sure this can
> use __get_user(). At the very least, it should be get_user().
Thanks! I agree, __get_user() should be fine.
>
> > + return -EFAULT;
> > +
> > + /* Set in the bitmap means that the gfn is userfault */
> > + return !!(bitmap_chunk & (1ul << (offset % BITS_PER_LONG)));
>
> test_bit()?
Thanks for all the feedback and applying it for me in those patches
you sent back. :)
Powered by blists - more mailing lists