lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6b5pkr4eh3l6c2ovp6t2m7phonp4kr2z5k5facrsktcmsyztqo@2hjgi7c455km>
Date: Wed, 14 May 2025 09:43:06 +0300
From: "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>
To: "Huang, Kai" <kai.huang@...el.com>
Cc: "Zhao, Yan Y" <yan.y.zhao@...el.com>, 
	"Edgecombe, Rick P" <rick.p.edgecombe@...el.com>, "seanjc@...gle.com" <seanjc@...gle.com>, 
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "bp@...en8.de" <bp@...en8.de>, "x86@...nel.org" <x86@...nel.org>, 
	"mingo@...hat.com" <mingo@...hat.com>, "tglx@...utronix.de" <tglx@...utronix.de>, 
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>
Subject: Re: [RFC, PATCH 08/12] KVM: x86/tdp_mmu: Add phys_prepare() and
 phys_cleanup() to kvm_x86_ops

On Wed, May 14, 2025 at 12:00:17AM +0000, Huang, Kai wrote:
> On Mon, 2025-05-12 at 12:55 +0300, Kirill A. Shutemov wrote:
> > On Fri, May 09, 2025 at 09:25:58AM +0800, Yan Zhao wrote:
> > > On Thu, May 08, 2025 at 04:23:56PM +0300, Kirill A. Shutemov wrote:
> > > > On Tue, May 06, 2025 at 07:55:17PM +0800, Yan Zhao wrote:
> > > > > On Fri, May 02, 2025 at 04:08:24PM +0300, Kirill A. Shutemov wrote:
> > > > > > The functions kvm_x86_ops::link_external_spt() and
> > > > > > kvm_x86_ops::set_external_spte() are used to assign new memory to a VM.
> > > > > > When using TDX with Dynamic PAMT enabled, the assigned memory must be
> > > > > > covered by PAMT.
> > > > > > 
> > > > > > The new function kvm_x86_ops::phys_prepare() is called before
> > > > > > link_external_spt() and set_external_spte() to ensure that the memory is
> > > > > > ready to be assigned to the virtual machine. In the case of TDX, it
> > > > > > makes sure that the memory is covered by PAMT.
> > > > > > 
> > > > > > kvm_x86_ops::phys_prepare() is called in a context where struct kvm_vcpu
> > > > > > is available, allowing the implementation to allocate memory from a
> > > > > > per-VCPU pool.
> > > > > > 
> > > > > Why not invoke phys_prepare() and phys_cleanup() in set_external_spte_present()?
> > > > > Or in tdx_sept_set_private_spte()/tdx_sept_link_private_spt()?
> > > > 
> > > > Because the memory pool we allocated from is per-vcpu and we lost access
> > > > to vcpu by then. And not all callers provide vcpu.
> > > Maybe we can get vcpu via kvm_get_running_vcpu(), as in [1].
> > > Then for callers not providing vcpu (where vcpu is NULL), we can use per-KVM
> > > cache? 
> > 
> > Hm. I was not aware of kvm_get_running_vcpu(). Will play with it, thanks.
> 
> I am not sure why per-vcpu cache matters.
> 
> For non-leaf SEPT pages, AFAICT the "vcpu->arch.mmu_external_spt_cache" is just
> an empty cache, and eventually __get_free_page() is used to allocate in:
>                                                                                             
>   sp->external_spt = 
> 	kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_external_spt_cache);
> 
> So why not we actually create a kmem_cache for it with an actual 'ctor', and we
> can call tdx_alloc_page() in that.  This makes sure when the "external_spt" is
> allocated, the underneath PAMT entry is there.

This would make hard to debug PAMT memory leaks. external_spt pages in the
pool will have PAMT memory tied to them, so we will have non-zero PAMT
memory usage with zero TDs running.

> For the last level guest memory page, similar to SEV, we can hook the
> kvm_arch_gmem_prepare() to call tdx_alloc_page() to make PAMT entry ready.

I don't think kvm_arch_gmem_prepare() is right place to allocate PAMT
memory. THPs are dynamic and page order can change due to split or
collapse between the time the page is allocated and gets mapped into EPT.
I am not sure if SEV code is correct in this regard.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ