lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240328053432.GO2444378@ls.amr.corp.intel.com>
Date: Wed, 27 Mar 2024 22:34:32 -0700
From: Isaku Yamahata <isaku.yamahata@...el.com>
To: "Huang, Kai" <kai.huang@...el.com>
Cc: Isaku Yamahata <isaku.yamahata@...el.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
	Paolo Bonzini <pbonzini@...hat.com>,
	"Aktas, Erdem" <erdemaktas@...gle.com>,
	Sean Christopherson <seanjc@...gle.com>,
	Sagi Shahar <sagis@...gle.com>, "Chen, Bo2" <chen.bo@...el.com>,
	"Yuan, Hang" <hang.yuan@...el.com>,
	"Zhang, Tina" <tina.zhang@...el.com>,
	Sean Christopherson <sean.j.christopherson@...el.com>,
	isaku.yamahata@...ux.intel.com
Subject: Re: [PATCH v19 038/130] KVM: TDX: create/destroy VM structure

On Thu, Mar 28, 2024 at 02:49:56PM +1300,
"Huang, Kai" <kai.huang@...el.com> wrote:

> 
> 
> On 28/03/2024 11:53 am, Isaku Yamahata wrote:
> > On Tue, Mar 26, 2024 at 02:43:54PM +1300,
> > "Huang, Kai" <kai.huang@...el.com> wrote:
> > 
> > > ... continue the previous review ...
> > > 
> > > > +
> > > > +static void tdx_reclaim_control_page(unsigned long td_page_pa)
> > > > +{
> > > > +	WARN_ON_ONCE(!td_page_pa);
> > > 
> > >  From the name 'td_page_pa' we cannot tell whether it is a control page, but
> > > this function is only intended for control page AFAICT, so perhaps a more
> > > specific name.
> > > 
> > > > +
> > > > +	/*
> > > > +	 * TDCX are being reclaimed.  TDX module maps TDCX with HKID
> > > 
> > > "are" -> "is".
> > > 
> > > Are you sure it is TDCX, but not TDCS?
> > > 
> > > AFAICT TDCX is the control structure for 'vcpu', but here you are handling
> > > the control structure for the VM.
> > 
> > TDCS, TDVPR, and TDCX.  Will update the comment.
> 
> But TDCX, TDVPR are vcpu-scoped.  Do you want to mention them _here_?

So I'll make the patch that frees TDVPR, TDCX will change this comment.


> Otherwise you will have to explain them.
> 
> [...]
> 
> > > > +
> > > > +void tdx_mmu_release_hkid(struct kvm *kvm)
> > > > +{
> > > > +	bool packages_allocated, targets_allocated;
> > > > +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > > > +	cpumask_var_t packages, targets;
> > > > +	u64 err;
> > > > +	int i;
> > > > +
> > > > +	if (!is_hkid_assigned(kvm_tdx))
> > > > +		return;
> > > > +
> > > > +	if (!is_td_created(kvm_tdx)) {
> > > > +		tdx_hkid_free(kvm_tdx);
> > > > +		return;
> > > > +	}
> > > 
> > > I lost tracking what does "td_created()" mean.
> > > 
> > > I guess it means: KeyID has been allocated to the TDX guest, but not yet
> > > programmed/configured.
> > > 
> > > Perhaps add a comment to remind the reviewer?
> > 
> > As Chao suggested, will introduce state machine for vm and vcpu.
> > 
> > https://lore.kernel.org/kvm/ZfvI8t7SlfIsxbmT@chao-email/
> 
> Could you elaborate what will the state machine look like?
> 
> I need to understand it.

Not yet. Chao only propose to introduce state machine. Right now it's just an
idea.


> > How about this?
> > 
> > /*
> >   * We need three SEAMCALLs, TDH.MNG.VPFLUSHDONE(), TDH.PHYMEM.CACHE.WB(), and
> >   * TDH.MNG.KEY.FREEID() to free the HKID.
> >   * Other threads can remove pages from TD.  When the HKID is assigned, we need
> >   * to use TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE().
> >   * TDH.PHYMEM.PAGE.RECLAIM() is needed when the HKID is free.  Get lock to not
> >   * present transient state of HKID.
> >   */
> 
> Could you elaborate why it is still possible to have other thread removing
> pages from TD?
> 
> I am probably missing something, but the thing I don't understand is why
> this function is triggered by MMU release?  All the things done in this
> function don't seem to be related to MMU at all.

The KVM releases EPT pages on MMU notifier release.  kvm_mmu_zap_all() does. If
we follow that way, kvm_mmu_zap_all() zaps all the Secure-EPTs by
TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE().  Because
TDH.MEM.{SEPT, PAGE}.REMOVE() is slow, we can free HKID before kvm_mmu_zap_all()
to use TDH.PHYMEM.PAGE.RECLAIM().


> IIUC, by reaching here, you must already have done VPFLUSHDONE, which should
> be called when you free vcpu?

Not necessarily.


> Freeing vcpus is done in
> kvm_arch_destroy_vm(), which is _after_ mmu_notifier->release(), in which
> this tdx_mmu_release_keyid() is called?

guest memfd complicates things.  The race is between guest memfd release and mmu
notifier release.  kvm_arch_destroy_vm() is called after closing all kvm fds
including guest memfd.

Here is the example.  Let's say, we have fds for vhost, guest_memfd, kvm vcpu,
and kvm vm.  The process is exiting.  Please notice vhost increments the
reference of the mmu to access guest (shared) memory.

exit_mmap():
  Usually mmu notifier release is fired. But not yet because of vhost.

exit_files()
  close vhost fd. vhost starts timer to issue mmput().

  close guest_memfd.  kvm_gmem_release() calls kvm_mmu_unmap_gfn_range().
    kvm_mmu_unmap_gfn_range() eventually this calls TDH.MEM.SEPT.REMOVE()
    and TDH.MEM.PAGE.REMOVE().  This takes time because it processes whole
    guest memory. Call kvm_put_kvm() at last.

  During unmapping on behalf of guest memfd, the timer of vhost fires to call
  mmput().  It triggers mmu notifier release.

  Close kvm vcpus/vm. they call kvm_put_kvm().  The last one calls
  kvm_destroy_vm().  

It's ideal to free HKID first for efficiency. But KVM doesn't have control on
the order of fds.


> But here we are depending vcpus to be freed before tdx_mmu_release_hkid()?

Not necessarily.
-- 
Isaku Yamahata <isaku.yamahata@...el.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ