[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6b5pkr4eh3l6c2ovp6t2m7phonp4kr2z5k5facrsktcmsyztqo@2hjgi7c455km>
Date: Wed, 14 May 2025 09:43:06 +0300
From: "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>
To: "Huang, Kai" <kai.huang@...el.com>
Cc: "Zhao, Yan Y" <yan.y.zhao@...el.com>,
"Edgecombe, Rick P" <rick.p.edgecombe@...el.com>, "seanjc@...gle.com" <seanjc@...gle.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "bp@...en8.de" <bp@...en8.de>, "x86@...nel.org" <x86@...nel.org>,
"mingo@...hat.com" <mingo@...hat.com>, "tglx@...utronix.de" <tglx@...utronix.de>,
"pbonzini@...hat.com" <pbonzini@...hat.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"Yamahata, Isaku" <isaku.yamahata@...el.com>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>
Subject: Re: [RFC, PATCH 08/12] KVM: x86/tdp_mmu: Add phys_prepare() and
phys_cleanup() to kvm_x86_ops
On Wed, May 14, 2025 at 12:00:17AM +0000, Huang, Kai wrote:
> On Mon, 2025-05-12 at 12:55 +0300, Kirill A. Shutemov wrote:
> > On Fri, May 09, 2025 at 09:25:58AM +0800, Yan Zhao wrote:
> > > On Thu, May 08, 2025 at 04:23:56PM +0300, Kirill A. Shutemov wrote:
> > > > On Tue, May 06, 2025 at 07:55:17PM +0800, Yan Zhao wrote:
> > > > > On Fri, May 02, 2025 at 04:08:24PM +0300, Kirill A. Shutemov wrote:
> > > > > > The functions kvm_x86_ops::link_external_spt() and
> > > > > > kvm_x86_ops::set_external_spte() are used to assign new memory to a VM.
> > > > > > When using TDX with Dynamic PAMT enabled, the assigned memory must be
> > > > > > covered by PAMT.
> > > > > >
> > > > > > The new function kvm_x86_ops::phys_prepare() is called before
> > > > > > link_external_spt() and set_external_spte() to ensure that the memory is
> > > > > > ready to be assigned to the virtual machine. In the case of TDX, it
> > > > > > makes sure that the memory is covered by PAMT.
> > > > > >
> > > > > > kvm_x86_ops::phys_prepare() is called in a context where struct kvm_vcpu
> > > > > > is available, allowing the implementation to allocate memory from a
> > > > > > per-VCPU pool.
> > > > > >
> > > > > Why not invoke phys_prepare() and phys_cleanup() in set_external_spte_present()?
> > > > > Or in tdx_sept_set_private_spte()/tdx_sept_link_private_spt()?
> > > >
> > > > Because the memory pool we allocated from is per-vcpu and we lost access
> > > > to vcpu by then. And not all callers provide vcpu.
> > > Maybe we can get vcpu via kvm_get_running_vcpu(), as in [1].
> > > Then for callers not providing vcpu (where vcpu is NULL), we can use per-KVM
> > > cache?
> >
> > Hm. I was not aware of kvm_get_running_vcpu(). Will play with it, thanks.
>
> I am not sure why per-vcpu cache matters.
>
> For non-leaf SEPT pages, AFAICT the "vcpu->arch.mmu_external_spt_cache" is just
> an empty cache, and eventually __get_free_page() is used to allocate in:
>
> sp->external_spt =
> kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_external_spt_cache);
>
> So why not we actually create a kmem_cache for it with an actual 'ctor', and we
> can call tdx_alloc_page() in that. This makes sure when the "external_spt" is
> allocated, the underneath PAMT entry is there.
This would make hard to debug PAMT memory leaks. external_spt pages in the
pool will have PAMT memory tied to them, so we will have non-zero PAMT
memory usage with zero TDs running.
> For the last level guest memory page, similar to SEV, we can hook the
> kvm_arch_gmem_prepare() to call tdx_alloc_page() to make PAMT entry ready.
I don't think kvm_arch_gmem_prepare() is right place to allocate PAMT
memory. THPs are dynamic and page order can change due to split or
collapse between the time the page is allocated and gets mapped into EPT.
I am not sure if SEV code is correct in this regard.
--
Kiryl Shutsemau / Kirill A. Shutemov
Powered by blists - more mailing lists