[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240328053432.GO2444378@ls.amr.corp.intel.com>
Date: Wed, 27 Mar 2024 22:34:32 -0700
From: Isaku Yamahata <isaku.yamahata@...el.com>
To: "Huang, Kai" <kai.huang@...el.com>
Cc: Isaku Yamahata <isaku.yamahata@...el.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
Paolo Bonzini <pbonzini@...hat.com>,
"Aktas, Erdem" <erdemaktas@...gle.com>,
Sean Christopherson <seanjc@...gle.com>,
Sagi Shahar <sagis@...gle.com>, "Chen, Bo2" <chen.bo@...el.com>,
"Yuan, Hang" <hang.yuan@...el.com>,
"Zhang, Tina" <tina.zhang@...el.com>,
Sean Christopherson <sean.j.christopherson@...el.com>,
isaku.yamahata@...ux.intel.com
Subject: Re: [PATCH v19 038/130] KVM: TDX: create/destroy VM structure
On Thu, Mar 28, 2024 at 02:49:56PM +1300,
"Huang, Kai" <kai.huang@...el.com> wrote:
>
>
> On 28/03/2024 11:53 am, Isaku Yamahata wrote:
> > On Tue, Mar 26, 2024 at 02:43:54PM +1300,
> > "Huang, Kai" <kai.huang@...el.com> wrote:
> >
> > > ... continue the previous review ...
> > >
> > > > +
> > > > +static void tdx_reclaim_control_page(unsigned long td_page_pa)
> > > > +{
> > > > + WARN_ON_ONCE(!td_page_pa);
> > >
> > > From the name 'td_page_pa' we cannot tell whether it is a control page, but
> > > this function is only intended for control page AFAICT, so perhaps a more
> > > specific name.
> > >
> > > > +
> > > > + /*
> > > > + * TDCX are being reclaimed. TDX module maps TDCX with HKID
> > >
> > > "are" -> "is".
> > >
> > > Are you sure it is TDCX, but not TDCS?
> > >
> > > AFAICT TDCX is the control structure for 'vcpu', but here you are handling
> > > the control structure for the VM.
> >
> > TDCS, TDVPR, and TDCX. Will update the comment.
>
> But TDCX, TDVPR are vcpu-scoped. Do you want to mention them _here_?
So I'll make the patch that frees TDVPR, TDCX will change this comment.
> Otherwise you will have to explain them.
>
> [...]
>
> > > > +
> > > > +void tdx_mmu_release_hkid(struct kvm *kvm)
> > > > +{
> > > > + bool packages_allocated, targets_allocated;
> > > > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > > > + cpumask_var_t packages, targets;
> > > > + u64 err;
> > > > + int i;
> > > > +
> > > > + if (!is_hkid_assigned(kvm_tdx))
> > > > + return;
> > > > +
> > > > + if (!is_td_created(kvm_tdx)) {
> > > > + tdx_hkid_free(kvm_tdx);
> > > > + return;
> > > > + }
> > >
> > > I lost tracking what does "td_created()" mean.
> > >
> > > I guess it means: KeyID has been allocated to the TDX guest, but not yet
> > > programmed/configured.
> > >
> > > Perhaps add a comment to remind the reviewer?
> >
> > As Chao suggested, will introduce state machine for vm and vcpu.
> >
> > https://lore.kernel.org/kvm/ZfvI8t7SlfIsxbmT@chao-email/
>
> Could you elaborate what will the state machine look like?
>
> I need to understand it.
Not yet. Chao only propose to introduce state machine. Right now it's just an
idea.
> > How about this?
> >
> > /*
> > * We need three SEAMCALLs, TDH.MNG.VPFLUSHDONE(), TDH.PHYMEM.CACHE.WB(), and
> > * TDH.MNG.KEY.FREEID() to free the HKID.
> > * Other threads can remove pages from TD. When the HKID is assigned, we need
> > * to use TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE().
> > * TDH.PHYMEM.PAGE.RECLAIM() is needed when the HKID is free. Get lock to not
> > * present transient state of HKID.
> > */
>
> Could you elaborate why it is still possible to have other thread removing
> pages from TD?
>
> I am probably missing something, but the thing I don't understand is why
> this function is triggered by MMU release? All the things done in this
> function don't seem to be related to MMU at all.
The KVM releases EPT pages on MMU notifier release. kvm_mmu_zap_all() does. If
we follow that way, kvm_mmu_zap_all() zaps all the Secure-EPTs by
TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE(). Because
TDH.MEM.{SEPT, PAGE}.REMOVE() is slow, we can free HKID before kvm_mmu_zap_all()
to use TDH.PHYMEM.PAGE.RECLAIM().
> IIUC, by reaching here, you must already have done VPFLUSHDONE, which should
> be called when you free vcpu?
Not necessarily.
> Freeing vcpus is done in
> kvm_arch_destroy_vm(), which is _after_ mmu_notifier->release(), in which
> this tdx_mmu_release_keyid() is called?
guest memfd complicates things. The race is between guest memfd release and mmu
notifier release. kvm_arch_destroy_vm() is called after closing all kvm fds
including guest memfd.
Here is the example. Let's say, we have fds for vhost, guest_memfd, kvm vcpu,
and kvm vm. The process is exiting. Please notice vhost increments the
reference of the mmu to access guest (shared) memory.
exit_mmap():
Usually mmu notifier release is fired. But not yet because of vhost.
exit_files()
close vhost fd. vhost starts timer to issue mmput().
close guest_memfd. kvm_gmem_release() calls kvm_mmu_unmap_gfn_range().
kvm_mmu_unmap_gfn_range() eventually this calls TDH.MEM.SEPT.REMOVE()
and TDH.MEM.PAGE.REMOVE(). This takes time because it processes whole
guest memory. Call kvm_put_kvm() at last.
During unmapping on behalf of guest memfd, the timer of vhost fires to call
mmput(). It triggers mmu notifier release.
Close kvm vcpus/vm. they call kvm_put_kvm(). The last one calls
kvm_destroy_vm().
It's ideal to free HKID first for efficiency. But KVM doesn't have control on
the order of fds.
> But here we are depending vcpus to be freed before tdx_mmu_release_hkid()?
Not necessarily.
--
Isaku Yamahata <isaku.yamahata@...el.com>
Powered by blists - more mailing lists