linux-kernel - Re: [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d79b72fcb1bfb2015420c61b8b5f0c563154ca3a.camel@intel.com>
Date: Wed, 4 Feb 2026 06:45:50 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "seanjc@...gle.com" <seanjc@...gle.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-coco@...ts.linux.dev"
	<linux-coco@...ts.linux.dev>, "Li, Xiaoyao" <xiaoyao.li@...el.com>, "Zhao,
 Yan Y" <yan.y.zhao@...el.com>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "kas@...nel.org" <kas@...nel.org>,
	"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "mingo@...hat.com"
	<mingo@...hat.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"ackerleytng@...gle.com" <ackerleytng@...gle.com>, "tglx@...nel.org"
	<tglx@...nel.org>, "sagis@...gle.com" <sagis@...gle.com>, "Edgecombe, Rick P"
	<rick.p.edgecombe@...el.com>, "bp@...en8.de" <bp@...en8.de>, "Annapurve,
 Vishal" <vannapurve@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory_cache to
 provide a custom page allocator

On Tue, 2026-02-03 at 18:16 -0800, Sean Christopherson wrote:
> On Tue, Feb 03, 2026, Kai Huang wrote:
> > On Tue, 2026-02-03 at 12:12 -0800, Sean Christopherson wrote:
> > > On Tue, Feb 03, 2026, Kai Huang wrote:
> > > > On Wed, 2026-01-28 at 17:14 -0800, Sean Christopherson wrote:
> > > > > Extend "struct kvm_mmu_memory_cache" to support a custom page allocator
> > > > > so that x86's TDX can update per-page metadata on allocation and free().
> > > > > 
> > > > > Name the allocator page_get() to align with __get_free_page(), e.g. to
> > > > > communicate that it returns an "unsigned long", not a "struct page", and
> > > > > to avoid collisions with macros, e.g. with alloc_page.
> > > > > 
> > > > > Suggested-by: Kai Huang <kai.huang@...el.com>
> > > > > Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> > > > 
> > > > I thought it could be more generic for allocating an object, but not just a
> > > > page.
> > > > 
> > > > E.g., I thought we might be able to use it to allocate a structure which has
> > > > "pair of DPAMT pages" so it could be assigned to 'struct kvm_mmu_page'.  But
> > > > it seems you abandoned this idea.  May I ask why?  Just want to understand
> > > > the reasoning here.
> > > 
> > > Because that requires more complexity and there's no known use case, and I don't
> > > see an obvious way for a use case to come along.  All of the motiviations for a
> > > custom allocation scheme that I can think of apply only to full pages, or fit
> > > nicely in a kmem_cache.
> > > 
> > > Specifically, the "cache" logic is already bifurcated between "kmem_cache' and
> > > "page" usage.  Further splitting the "page" case doesn't require modifications to
> > > the "kmem_cache" case, whereas providing a fully generic solution would require
> > > additional changes, e.g. to handle this code:
> > > 
> > > 	page = (void *)__get_free_page(gfp_flags);
> > > 	if (page && mc->init_value)
> > > 		memset64(page, mc->init_value, PAGE_SIZE / sizeof(u64));
> > > 
> > > It certainly wouldn't be much complexity, but this code is already a bit awkward,
> > > so I don't think it makes sense to add support for something that will probably
> > > never be used.
> > 
> > For this particular piece of code, we can add a helper for allocating normal
> > page table pages, get rid of mc->init_value completely and hook mc-page_get()
> > to that helper.
> 
> Hmm, I like the idea, but I don't think it would be a net positive.  In practice,
> x86's "normal" page tables stop being normal, because KVM now initializes all
> SPTEs with BIT(63)=1 on x86-64.  And that would also incur an extra RETPOLINE on
> all those allocations.

No argument on this.  People hate indirect calls I guess. :-)

> 
> > A bonus is we can then call that helper in all places when KVM needs to
> > allocate a page for normal page table instead of just calling
> > get_zerod_pages() directly, e.g., like the one in
> > tdp_mmu_alloc_sp_for_split(),
> 
> Huh.  Actually, that's a bug, but not the one you probably expect.  At a glance,
> it looks like KVM incorrectly zeroing the page instead of initializing it with
> SHADOW_NONPRESENT_VALUE.  But it's actually a "performance" bug, because KVM
> doesn't actually need to pre-initialize the page: either the page will never be
> used, or every SPTE will be initialized as a child SPTE.
> 
> So that one _should_ be different, e.g. should be:
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index a32192c35099..36afd67601fc 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1456,7 +1456,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm,
>         if (!sp)
>                 return NULL;
>  
> -       sp->spt = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> +       sp->spt = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
>         if (!sp->spt)
>                 goto err_spt;
> 

If we look from "performance" perspective, yeah indeed, albeit we probably
not gonna see any performance difference.

But no more arguments.  I just think it will be less error-prone if we have
a consistent way for allocating the same object (no matter what it is), but
it's just a theoretical thing.