[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zip-JsAB5TIRDJVl@google.com>
Date: Thu, 25 Apr 2024 09:00:38 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Isaku Yamahata <isaku.yamahata@...el.com>, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, michael.roth@....com, isaku.yamahata@...ux.intel.com
Subject: Re: [PATCH 09/11] KVM: guest_memfd: Add interface for populating gmem
pages with user data
On Thu, Apr 25, 2024, Paolo Bonzini wrote:
> On Thu, Apr 25, 2024 at 3:12 AM Isaku Yamahata <isaku.yamahata@...el.com> wrote:
> > > > get_user_pages_fast(source addr)
> > > > read_lock(mmu_lock)
> > > > kvm_tdp_mmu_get_walk_private_pfn(vcpu, gpa, &pfn);
> > > > if the page table doesn't map gpa, error.
> > > > TDH.MEM.PAGE.ADD()
> > > > TDH.MR.EXTEND()
> > > > read_unlock(mmu_lock)
> > > > put_page()
> > >
> > > Hmm, KVM doesn't _need_ to use invalidate_lock to protect against guest_memfd
> > > invalidation, but I also don't see why it would cause problems.
>
> The invalidate_lock is only needed to operate on the guest_memfd, but
> it's a rwsem so there are no risks of lock inversion.
>
> > > I.e. why not
> > > take mmu_lock() in TDX's post_populate() implementation?
> >
> > We can take the lock. Because we have already populated the GFN of guest_memfd,
> > we need to make kvm_gmem_populate() not pass FGP_CREAT_ONLY. Otherwise we'll
> > get -EEXIST.
>
> I don't understand why TDH.MEM.PAGE.ADD() cannot be called from the
> post-populate hook. Can the code for TDH.MEM.PAGE.ADD be shared
> between the memory initialization ioctl and the page fault hook in
> kvm_x86_ops?
Ah, because TDX is required to pre-fault the memory to establish the S-EPT walk,
and pre-faulting means guest_memfd()
Requiring that guest_memfd not have a page when initializing the guest image
seems wrong, i.e. I don't think we want FGP_CREAT_ONLY. And not just because I
am a fan of pre-faulting, I think the semantics are bad.
E.g. IIUC, doing fallocate() to ensure memory is available would cause LAUNCH_UPDATE
to fail. That's weird and has nothing to do with KVM_PRE_FAULT.
I don't understand why we want FGP_CREAT_ONLY semantics. Who cares if there's a
page allocated? KVM already checks that the page is unassigned in the RMP, so
why does guest_memfd care whether or not the page was _just_ allocated?
AFAIK, unwinding on failure is completely uninteresting, and arguably undesirable,
because undoing LAUNCH_UPDATE or PAGE.ADD will affect the measurement, i.e. there
is no scenario where deleting pages from guest_memfd would allow a restart/resume
of the build process to truly succeed.
Powered by blists - more mailing lists