[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aTjS/c8c5wNZcSgO@yzhao56-desk.sh.intel.com>
Date: Wed, 10 Dec 2025 09:55:09 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>
CC: <pbonzini@...hat.com>, <seanjc@...gle.com>,
<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, <x86@...nel.org>,
<rick.p.edgecombe@...el.com>, <dave.hansen@...el.com>, <kas@...nel.org>,
<tabba@...gle.com>, <ackerleytng@...gle.com>, <quic_eberman@...cinc.com>,
<michael.roth@....com>, <david@...hat.com>, <vbabka@...e.cz>,
<thomas.lendacky@....com>, <pgonda@...gle.com>, <zhiquan1.li@...el.com>,
<fan.du@...el.com>, <jun.miao@...el.com>, <ira.weiny@...el.com>,
<isaku.yamahata@...el.com>, <xiaoyao.li@...el.com>,
<binbin.wu@...ux.intel.com>, <chao.p.peng@...el.com>
Subject: Re: [RFC PATCH v2 03/23] x86/tdx: Enhance
tdh_phymem_page_wbinvd_hkid() to invalidate huge pages
On Tue, Dec 09, 2025 at 05:30:54PM -0800, Vishal Annapurve wrote:
> On Tue, Dec 9, 2025 at 5:20 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
> >
> > On Tue, Dec 09, 2025 at 05:14:22PM -0800, Vishal Annapurve wrote:
> > > On Thu, Aug 7, 2025 at 2:42 AM Yan Zhao <yan.y.zhao@...el.com> wrote:
> > > >
> > > > index 0a2b183899d8..8eaf8431c5f1 100644
> > > > --- a/arch/x86/kvm/vmx/tdx.c
> > > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > > @@ -1694,6 +1694,7 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
> > > > {
> > > > int tdx_level = pg_level_to_tdx_sept_level(level);
> > > > struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > > > + struct folio *folio = page_folio(page);
> > > > gpa_t gpa = gfn_to_gpa(gfn);
> > > > u64 err, entry, level_state;
> > > >
> > > > @@ -1728,8 +1729,9 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
> > > > return -EIO;
> > > > }
> > > >
> > > > - err = tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, page);
> > > > -
> > > > + err = tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, folio,
> > > > + folio_page_idx(folio, page),
> > > > + KVM_PAGES_PER_HPAGE(level));
> > >
> > > This code seems to assume that folio_order() always matches the level
> > > at which it is mapped in the EPT entries.
> > I don't think so.
> > Please check the implemenation of tdh_phymem_page_wbinvd_hkid() [1].
> > Only npages=KVM_PAGES_PER_HPAGE(level) will be invalidated, while npages
> > <= folio_nr_pages(folio).
>
> Is the gfn passed to tdx_sept_drop_private_spte() always huge page
> aligned if mapping is at huge page granularity?
Yes.
The GFN passed to tdx_sept_set_private_spte() is huge page aligned in
kvm_tdp_mmu_map(). SEAMCALL TDH_MEM_PAGE_AUG will also fail otherwise.
The GFN passed to tdx_sept_remove_private_spte() comes from the same mapping
entry in the mirror EPT.
> If gfn/pfn is not aligned then when folio is split to 4K, page_folio()
> will return the same page and folio_order and folio_page_idx() will be
> zero. This can cause tdh_phymem_page_wbinvd_hkid() to return failure.
>
> If the expectation is that page_folio() will always point to a head
> page for given hugepage granularity mapping then that logic will not
> work correctly IMO.
The current logic is that:
1. tdh_mem_page_aug() maps physical memory starting from the page at "start_idx"
within a "folio" and spanning "npages" contiguous PFNs.
(npages corresponds to the mapping level KVM_PAGES_PER_HPAGE(level)).
e.g. it can map at level 2MB, starting from the 4MB offset in a folio of
order 1GB.
2. if split occurs, the huge 2MB mapping will be split into 4KB ones, while the
underlying folio remains 1GB.
e.g. now the 0th 4KB mapping after split points to the 4MB offset in the
1GB folio, and the 1st 4KB mapping points to the 4MB+4KB offset...
The mapping level after split is 4KB.
3. tdx_sept_remove_private_spte() invokes tdh_mem_page_remove() and
tdh_phymem_page_wbinvd_hkid().
-The GFN is 2MB aligned and level is 2MB if split does not occur or
-The GFN is 4KB aligned and level is 4KB if split has occurred.
While the underlying folio remains 1GB, the folio_page_idx(folio, page)
specifies the offset in the folio, and the npages corresponding to
the mapping level is <= folio_nr_pages(folio).
> > [1] https://lore.kernel.org/all/20250807094202.4481-1-yan.y.zhao@intel.com/
> >
> > > IIUC guest_memfd can decide
> > > to split folios to 4K for the complete huge folio before zapping the
> > > hugepage EPT mappings. I think it's better to just round the pfn to
> > > the hugepage address based on the level they were mapped at instead of
> > > relying on the folio order.
Powered by blists - more mailing lists