linux-kernel - Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aBAiCBmON0g0Qro1@yzhao56-desk.sh.intel.com>
Date: Tue, 29 Apr 2025 08:49:12 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>
CC: <pbonzini@...hat.com>, <seanjc@...gle.com>,
	<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, <x86@...nel.org>,
	<rick.p.edgecombe@...el.com>, <dave.hansen@...el.com>,
	<kirill.shutemov@...el.com>, <tabba@...gle.com>, <ackerleytng@...gle.com>,
	<quic_eberman@...cinc.com>, <michael.roth@....com>, <david@...hat.com>,
	<vbabka@...e.cz>, <jroedel@...e.de>, <thomas.lendacky@....com>,
	<pgonda@...gle.com>, <zhiquan1.li@...el.com>, <fan.du@...el.com>,
	<jun.miao@...el.com>, <ira.weiny@...el.com>, <isaku.yamahata@...el.com>,
	<xiaoyao.li@...el.com>, <binbin.wu@...ux.intel.com>, <chao.p.peng@...el.com>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge
 pages

On Mon, Apr 28, 2025 at 05:17:16PM -0700, Vishal Annapurve wrote:
> On Wed, Apr 23, 2025 at 8:07 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
> >
> > Increase folio ref count before mapping a private page, and decrease
> > folio ref count after a mapping failure or successfully removing a private
> > page.
> >
> > The folio ref count to inc/dec corresponds to the mapping/unmapping level,
> > ensuring the folio ref count remains balanced after entry splitting or
> > merging.
> >
> > Signed-off-by: Xiaoyao Li <xiaoyao.li@...el.com>
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@...el.com>
> > Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
> > ---
> >  arch/x86/kvm/vmx/tdx.c | 19 ++++++++++---------
> >  1 file changed, 10 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 355b21fc169f..e23dce59fc72 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -1501,9 +1501,9 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
> >         td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa);
> >  }
> >
> > -static void tdx_unpin(struct kvm *kvm, struct page *page)
> > +static void tdx_unpin(struct kvm *kvm, struct page *page, int level)
> >  {
> > -       put_page(page);
> > +       folio_put_refs(page_folio(page), KVM_PAGES_PER_HPAGE(level));
> >  }
> >
> >  static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn,
> > @@ -1517,13 +1517,13 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn,
> >
> >         err = tdh_mem_page_aug(&kvm_tdx->td, gpa, tdx_level, page, &entry, &level_state);
> >         if (unlikely(tdx_operand_busy(err))) {
> > -               tdx_unpin(kvm, page);
> > +               tdx_unpin(kvm, page, level);
> >                 return -EBUSY;
> >         }
> >
> >         if (KVM_BUG_ON(err, kvm)) {
> >                 pr_tdx_error_2(TDH_MEM_PAGE_AUG, err, entry, level_state);
> > -               tdx_unpin(kvm, page);
> > +               tdx_unpin(kvm, page, level);
> >                 return -EIO;
> >         }
> >
> > @@ -1570,10 +1570,11 @@ int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
> >          * a_ops->migrate_folio (yet), no callback is triggered for KVM on page
> >          * migration.  Until guest_memfd supports page migration, prevent page
> >          * migration.
> > -        * TODO: Once guest_memfd introduces callback on page migration,
> > -        * implement it and remove get_page/put_page().
> > +        * TODO: To support in-place-conversion in gmem in futre, remove
> > +        * folio_ref_add()/folio_put_refs().
> 
> With necessary infrastructure in guest_memfd [1] to prevent page
> migration, is it necessary to acquire extra folio refcounts? If not,
> why not just cleanup this logic now?
Though the old comment says acquiring the lock is for page migration, the other
reason is to prevent the folio from being returned to the OS until it has been
successfully removed from TDX.

If there's an error during the removal or reclaiming of a folio from TDX, such
as a failure in tdh_mem_page_remove()/tdh_phymem_page_wbinvd_hkid() or
tdh_phymem_page_reclaim(), it is important to hold the page refcount within TDX.

So, we plan to remove folio_ref_add()/folio_put_refs() in future, only invoking
folio_ref_add() in the event of a removal failure.

> [1] https://git.kernel.org/pub/scm/virt/kvm/kvm.git/tree/virt/kvm/guest_memfd.c?h=kvm-coco-queue#n441
> 
> > Only increase the folio ref count
> > +        * when there're errors during removing private pages.
> >          */
> > -       get_page(page);
> > +       folio_ref_add(page_folio(page), KVM_PAGES_PER_HPAGE(level));
> >
> >         /*
> >          * Read 'pre_fault_allowed' before 'kvm_tdx->state'; see matching
> > @@ -1647,7 +1648,7 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
> >                 return -EIO;
> >
> >         tdx_clear_page(page, level);
> > -       tdx_unpin(kvm, page);
> > +       tdx_unpin(kvm, page, level);
> >         return 0;
> >  }
> >
> > @@ -1727,7 +1728,7 @@ static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn,
> >         if (tdx_is_sept_zap_err_due_to_premap(kvm_tdx, err, entry, level) &&
> >             !KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) {
> >                 atomic64_dec(&kvm_tdx->nr_premapped);
> > -               tdx_unpin(kvm, page);
> > +               tdx_unpin(kvm, page, level);
> >                 return 0;
> >         }
> >
> > --
> > 2.43.2
> >