linux-kernel - Re: [PATCH v3 12/16] x86/virt/tdx: Add helpers to allow for pre-allocating pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1209625a68d5abd58b7f4063c109d663b318a40.camel@intel.com>
Date: Fri, 26 Sep 2025 23:47:16 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-coco@...ts.linux.dev"
	<linux-coco@...ts.linux.dev>, "Huang, Kai" <kai.huang@...el.com>, "Zhao, Yan
 Y" <yan.y.zhao@...el.com>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "kas@...nel.org" <kas@...nel.org>,
	"seanjc@...gle.com" <seanjc@...gle.com>, "mingo@...hat.com"
	<mingo@...hat.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>, "Yamahata, Isaku"
	<isaku.yamahata@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "Annapurve, Vishal" <vannapurve@...gle.com>,
	"Gao, Chao" <chao.gao@...el.com>, "bp@...en8.de" <bp@...en8.de>,
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v3 12/16] x86/virt/tdx: Add helpers to allow for
 pre-allocating pages

On Mon, 2025-09-22 at 11:20 +0000, Huang, Kai wrote:
> Since 'struct tdx_prealloc' replaces the KVM standard 'struct
> kvm_mmu_memory_cache' for external page table, and it is allowed to
> fail in "topup" operation, why not just call tdx_alloc_page() to
> "topup" page for external page table here?

I sympathize with the intuition. It would be nice to just prep
everything and then operate on it like normal.

We want this to not have to be totally redone for huge pages. In the
huge pages case, we could do this approach for the page tables, but for
the private page itself, we don't know whether we need 4KB PAMT backing
or not. So we don't fully know whether a TDX private page needs PAMT
4KB backing or not before the fault.

So we would need, like, separate pools for page tables and private
pages. Or someway to unwind the wrong guess of small page. At that
point I don't think it's simpler.

> 
> I don't think we need to keep all "DPAMT pages" in the pool, right?

Not sure what you mean by this.

> 
> If tdx_alloc_page() succeeds, then the "DPAMT pages" are also
> "topup"ed, and PAMT entries for the 2M range of the SEPT page is
> ready too.
> 
> This at least avoids having to export tdx_dpamt_entry_pages(), which
> is not nice IMHO.  And I think it should yield simpler code.

I mean less exports is better, but I don't follow what is so egregious.
It's not called from core KVM code.

> 
> One more thinking:
> 
> I also have been thinking whether we can continue to use the KVM
> standard 'struct kvm_mmu_memory_cache' for S-EPT pages.  Below is one
> more idea for your reference.

The point of the new struct was to hand it to the arch/x86 side of the
house. If we don't need to do that, then yes we could have options. And
Dave suggested another struct that could be used to hand off the cache.

> 
> In the previous discussion I think we concluded the 'kmem_cache'
> doesn't work nicely with DPAMT (due to the ctor() cannot fail etc). 
> And when we don't use 'kmem_cache', KVM just call __get_free_page()
> to topup objects.
> But we need tdx_alloc_page() for allocation here, so this is the
> problem.
> 
> If we add two callbacks for object allocation/free to 'struct
> kvm_mmu_memory_cache', then we can have place to hook
> tdx_alloc_page().

kvm_mmu_memory_cache has a lot of options at this point. All we really
need is a list. I'm not sure it makes sense to keep cramming things
into it?

> 
> Something like the below:
> 
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index 827ecc0b7e10..5dbd80773689 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -88,6 +88,8 @@ struct kvm_mmu_memory_cache {
>         gfp_t gfp_custom;
>         u64 init_value;
>         struct kmem_cache *kmem_cache;
> +       void*(*obj_alloc)(gfp_t gfp);
> +       void (*obj_free)(void *);
>         int capacity;
>         int nobjs;
>         void **objects;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6c07dd423458..df2c2100d656 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -355,7 +355,10 @@ static inline void
> *mmu_memory_cache_alloc_obj(struct
> kvm_mmu_memory_cache *mc,
>         if (mc->kmem_cache)
>                 return kmem_cache_alloc(mc->kmem_cache, gfp_flags);
>  
> -       page = (void *)__get_free_page(gfp_flags);
> +       if (mc->obj_alloc)
> +               page = mc->obj_alloc(gfp_flags);
> +       else
> +               page = (void *)__get_free_page(gfp_flags);
>         if (page && mc->init_value)
>                 memset64(page, mc->init_value, PAGE_SIZE /
> sizeof(u64));
>         return page;
> @@ -415,6 +418,8 @@ void kvm_mmu_free_memory_cache(struct
> kvm_mmu_memory_cache *mc)
>         while (mc->nobjs) {
>                 if (mc->kmem_cache)
>                         kmem_cache_free(mc->kmem_cache, mc->objects[-
> -mc-
> > nobjs]);
> +               else if (mc->obj_free)
> +                       mc->obj_free(mc->objects[--mc->nobjs]);
>                 else
>                         free_page((unsigned long)mc->objects[--mc-
> > nobjs]);
>         }