linux-kernel - Re: [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aYJl5XoQw5In9DOr@google.com>
Date: Tue, 3 Feb 2026 13:17:25 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Rick P Edgecombe <rick.p.edgecombe@...el.com>
Cc: Kai Huang <kai.huang@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
	"linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>, Xiaoyao Li <xiaoyao.li@...el.com>, 
	Yan Y Zhao <yan.y.zhao@...el.com>, 
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "kas@...nel.org" <kas@...nel.org>, 
	"mingo@...hat.com" <mingo@...hat.com>, "binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, 
	"pbonzini@...hat.com" <pbonzini@...hat.com>, Isaku Yamahata <isaku.yamahata@...el.com>, 
	"ackerleytng@...gle.com" <ackerleytng@...gle.com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "sagis@...gle.com" <sagis@...gle.com>, 
	"tglx@...nel.org" <tglx@...nel.org>, "bp@...en8.de" <bp@...en8.de>, Vishal Annapurve <vannapurve@...gle.com>, 
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory_cache to
 provide a custom page allocator

On Tue, Feb 03, 2026, Rick P Edgecombe wrote:
> On Tue, 2026-02-03 at 12:12 -0800, Sean Christopherson wrote:
> > > E.g., I thought we might be able to use it to allocate a structure which has
> > > "pair of DPAMT pages" so it could be assigned to 'struct kvm_mmu_page'.  But
> > > it seems you abandoned this idea.  May I ask why?  Just want to understand
> > > the reasoning here.
> > 
> > Because that requires more complexity and there's no known use case, and I
> > don't see an obvious way for a use case to come along.  All of the
> > motiviations for a custom allocation scheme that I can think of apply only to
> > full pages, or fit nicely in a kmem_cache.
> > 
> > Specifically, the "cache" logic is already bifurcated between "kmem_cache' and
> > "page" usage.  Further splitting the "page" case doesn't require modifications
> > to the "kmem_cache" case, whereas providing a fully generic solution would
> > require additional changes, e.g. to handle this code:
> > 
> > 	page = (void *)__get_free_page(gfp_flags);
> > 	if (page && mc->init_value)
> > 		memset64(page, mc->init_value, PAGE_SIZE / sizeof(u64));
> > 
> > It certainly wouldn't be much complexity, but this code is already a bit
> > awkward, so I don't think it makes sense to add support for something that
> > will probably never be used.
> 
> The thing that the design needlessly works around is that we can rely on that
> there are only two DPAMT pages per 2MB range. We don't need the dynamic page
> count allocations.
> 
> This means we don't need to pass around the list of pages that lets arch/x86
> take as many pages as it needs. We can maybe just pass in a struct like Kai was
> suggesting to the get/put helpers. So I was in the process of trying to morph
> this series in that direction to get rid of the complexity resulting from the
> dynamic assumption. 
> 
> This was what I had done in response to v4 discussions, so now retrofitting it
> into this new ops scheme. Care to warn me off of this before I have something to
> show?

That's largely orthogonal to this change.  This change is about preparing the
DPAMT when S-EPT page is allocated versus being installed.  The fact that DPAMT
requires at most two pages versus a more dynamic maximum is irrelevant.

The caches aren't about dynamic sizes (though they play nicely with them), they're
about:

  (a) not having to deal with allocating under spinlock
  (b) not having to free memory that goes unused (for a single page fault)
  (c) batching allocations for performance reasons (with the caveat that I doubt
      anyone has measured the performance impact in many, many years).

None of those talking points change at all if KVM needs to provide 2 pages versus
N pages.  The max number of pages needed for page tables is pretty much the same
thing as DPAMT, just with a higher max (4/5 vs. 2).  In both cases, the allocated
pages may or may not be consumed for any given fault.

For the leaf pages (including the hugepage splitting cases), which don't utilize
KVM's kvm_mmu_memory_cache, I wouldn't expect the KVM details to change all that
much.  In fact, they shouldn't change at all, because tracking 2 pages versus N
pages in "struct tdx_pamt_cache" is a detail that is 100% buried in the TDX subsystem
(which was pretty much the entire goal of my design).

Though maybe I'm misunderstanding what you have in mind?