[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a7f0639-78fc-4721-8d84-6224c83c07d2@intel.com>
Date: Wed, 14 May 2025 13:11:31 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
<pbonzini@...hat.com>, <seanjc@...gle.com>
CC: <rick.p.edgecombe@...el.com>, <isaku.yamahata@...el.com>,
<yan.y.zhao@...el.com>, <tglx@...utronix.de>, <mingo@...hat.com>,
<bp@...en8.de>, <dave.hansen@...ux.intel.com>, <kvm@...r.kernel.org>,
<x86@...nel.org>, <linux-coco@...ts.linux.dev>,
<linux-kernel@...r.kernel.org>
Subject: Re: [RFC, PATCH 11/12] KVM: TDX: Reclaim PAMT memory
On 3/05/2025 1:08 am, Kirill A. Shutemov wrote:
> The PAMT memory holds metadata for TDX-protected memory. With Dynamic
> PAMT, PAMT_4K is allocated on demand. The kernel supplies the TDX module
> with a few pages that cover 2M of host physical memory.
>
> PAMT memory can be reclaimed when the last user is gone. It can happen
> in a few code paths:
>
> - On TDH.PHYMEM.PAGE.RECLAIM in tdx_reclaim_td_control_pages() and
> tdx_reclaim_page().
>
> - On TDH.MEM.PAGE.REMOVE in tdx_sept_drop_private_spte().
>
> - In tdx_sept_zap_private_spte() for pages that were in the queue to be
> added with TDH.MEM.PAGE.ADD, but it never happened due to an error.
>
> Add tdx_pamt_put() in these code paths.
IMHO, instead of explicitly hooking tdx_pamt_put() to various places, we
should just do tdx_free_page() for the pages that were allocated by
tdx_alloc_page() (i.e., control pages, SEPT pages).
That means, IMHO, we should do PAMT allocation/free when we actually
*allocate* and *free* the target TDX private page(s). I.e., we should:
- For TDX private pages with normal kernel allocation (control pages,
SEPT pages etc), we use tdx_alloc_page() and tdx_free_page().
- For TDX private pages in page cache, i.e., guest_memfd, since we
cannot use tdx_{alloc|free}_page(), we hook guest_memfd code to call
tdx_pamt_{get|put}().
(I wish there's a way to unify the above two as well, but I don't have a
simple way to do that.)
I believe this can help simplifying the code.
So, ...
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> ---
> arch/x86/kvm/vmx/tdx.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 0f06ae7ff6b9..352f7b41f611 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -487,8 +487,11 @@ static int tdx_reclaim_page(struct page *page)
> int r;
>
> r = __tdx_reclaim_page(page);
> - if (!r)
> + if (!r) {
> tdx_clear_page(page);
> + tdx_pamt_put(page);
> + }
> +
> return r;
> }
>
... I think this change should be removed, and ...
[...]
> + tdx_pamt_put(kvm_tdx->td.tdr_page);
>
> __free_page(kvm_tdx->td.tdr_page);
... The above two should be just:
tdx_free_page(kvm_tdx->td.tdr_page);
and ...
> kvm_tdx->td.tdr_page = NULL;
> @@ -1768,6 +1772,7 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
> return -EIO;
> }
> tdx_clear_page(page);
> + tdx_pamt_put(page);
> tdx_unpin(kvm, page);
> return 0;
> }
> @@ -1848,6 +1853,7 @@ static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn,
> if (tdx_is_sept_zap_err_due_to_premap(kvm_tdx, err, entry, level) &&
> !KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) {
> atomic64_dec(&kvm_tdx->nr_premapped);
> + tdx_pamt_put(page);
> tdx_unpin(kvm, page);
> return 0;
> }
... the above should be removed too.
For PAMT associated with sp->external_spt, we can call tdx_pamt_put()
when we free sp->external_spt.
For PAMT associated with TDX memory in guest_memfd, we can have a
guest_memfd specific a_ops->folio_invalidate() in which we can have a
hook opposite to kvm_gmem_prepare_folio() to do tdx_pamt_put(). That
should cover all the cases, right?
Or anything I missed?
Powered by blists - more mailing lists