linux-kernel - Re: [RFC PATCH v2 22/23] KVM: TDX: Handle Dynamic PAMT on page split

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aKKp3fyoYgaaqidm@yzhao56-desk.sh.intel.com>
Date: Mon, 18 Aug 2025 12:19:41 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>
CC: <pbonzini@...hat.com>, <seanjc@...gle.com>,
	<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, <x86@...nel.org>,
	<rick.p.edgecombe@...el.com>, <dave.hansen@...el.com>, <kas@...nel.org>,
	<tabba@...gle.com>, <ackerleytng@...gle.com>, <quic_eberman@...cinc.com>,
	<michael.roth@....com>, <david@...hat.com>, <vbabka@...e.cz>,
	<thomas.lendacky@....com>, <pgonda@...gle.com>, <fan.du@...el.com>,
	<jun.miao@...el.com>, <ira.weiny@...el.com>, <isaku.yamahata@...el.com>,
	<xiaoyao.li@...el.com>, <binbin.wu@...ux.intel.com>, <chao.p.peng@...el.com>
Subject: Re: [RFC PATCH v2 22/23] KVM: TDX: Handle Dynamic PAMT on page split

On Wed, Aug 13, 2025 at 10:31:27PM -0700, Vishal Annapurve wrote:
> On Thu, Aug 7, 2025 at 2:46 AM Yan Zhao <yan.y.zhao@...el.com> wrote:
> >
> > From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
> > +static struct page *tdx_alloc_pamt_page_split(void *data)
> > +{
> > +       struct kvm *kvm = data;
> > +       void *p;
> > +
> > +       p = kvm_mmu_memory_cache_alloc(&kvm->arch.pamt_page_cache);
> > +       return virt_to_page(p);
> > +}
> > +
> >  static int tdx_spte_demote_private_spte(struct kvm *kvm, gfn_t gfn,
> > -                                       enum pg_level level, struct page *page)
> > +                                       enum pg_level level, struct page *page,
> > +                                       kvm_pfn_t pfn_for_gfn)
> >  {
> >         int tdx_level = pg_level_to_tdx_sept_level(level);
> > +       hpa_t hpa = pfn_to_hpa(pfn_for_gfn);
> >         struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> >         gpa_t gpa = gfn_to_gpa(gfn);
> >         u64 err, entry, level_state;
> > +       LIST_HEAD(pamt_pages);
> > +
> > +       tdx_pamt_get(page, PG_LEVEL_4K, tdx_alloc_pamt_page_split, kvm);
> 
> This invocation needs a return value check.
Ack.

> > +       tdx_alloc_pamt_pages(&pamt_pages, tdx_alloc_pamt_page_split, kvm);
> 
> IIUC tdx_pamt_get() will result in pamt_pages allocation above, so
> this step is not needed.

This step is to allocate pamt_pages for the guest 2MB page that needs splitting.
The above tdx_pamt_get() is for the EPT page to be added.
I'll add comments or update the param names for better clarity.

Regarding the absence of return value check for the tdx_alloc_pamt_pages(), I
think it's because the tdx_alloc_pamt_page_split() retrieves pages from the
pamt_page_cache via kvm_mmu_memory_cache_alloc(), which is guaranteed to succeed
(otherwise, there's a BUG_ON() in kvm_mmu_memory_cache_alloc()).

> >
> >         err = tdh_mem_page_demote(&kvm_tdx->td, gpa, tdx_level, page,
> > -                                 NULL, &entry, &level_state);
> > +                                 &pamt_pages, &entry, &level_state);
> >
> >         if (unlikely(tdx_operand_busy(err))) {
> >                 tdx_no_vcpus_enter_start(kvm);
> >                 err = tdh_mem_page_demote(&kvm_tdx->td, gpa, tdx_level, page,
> > -                                         NULL, &entry, &level_state);
> > +                                         &pamt_pages, &entry, &level_state);
> >                 tdx_no_vcpus_enter_stop(kvm);
> >         }
> >
> >         if (KVM_BUG_ON(err, kvm)) {
> > +               tdx_free_pamt_pages(&pamt_pages);
> 
> If tdx_alloc_pamt_pages() is not needed then this can be dropped as well.
> 
> > +               tdx_pamt_put(page, PG_LEVEL_4K);
> >                 pr_tdx_error_2(TDH_MEM_PAGE_DEMOTE, err, entry, level_state);
> >                 return -EIO;
> >         }
> > +
> > +       if (tdx_supports_dynamic_pamt(tdx_sysinfo))
> > +               atomic_set(tdx_get_pamt_refcount(hpa), PTRS_PER_PMD);
> 
> Should this be
> atomic_set(tdx_get_pamt_refcount(hpa), PTRS_PER_PMD -1 );
> 
> as tdx_pamt_get would have increased the refcount by 1 already above?
This hpa is for guest 2MB memory range. There shouldn't have any increased
pamt_refcount for this range before a successful demote.
So, atomic_set() to PTRS_PER_PMD looks correct, though atomic_add() seems even
safer.