linux-kernel - Re: [PATCH v3 11/24] KVM: x86/mmu: Introduce kvm_split_cross_boundary

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aXtweWFELioEZLzv@google.com>
Date: Thu, 29 Jan 2026 06:36:41 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: Kai Huang <kai.huang@...el.com>, Fan Du <fan.du@...el.com>, 
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, Xiaoyao Li <xiaoyao.li@...el.com>, 
	Dave Hansen <dave.hansen@...el.com>, "thomas.lendacky@....com" <thomas.lendacky@....com>, 
	"tabba@...gle.com" <tabba@...gle.com>, "vbabka@...e.cz" <vbabka@...e.cz>, "david@...nel.org" <david@...nel.org>, 
	"michael.roth@....com" <michael.roth@....com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Chao P Peng <chao.p.peng@...el.com>, 
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "ackerleytng@...gle.com" <ackerleytng@...gle.com>, 
	"kas@...nel.org" <kas@...nel.org>, "binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, 
	Ira Weiny <ira.weiny@...el.com>, "nik.borisov@...e.com" <nik.borisov@...e.com>, 
	"francescolavra.fl@...il.com" <francescolavra.fl@...il.com>, Isaku Yamahata <isaku.yamahata@...el.com>, 
	"sagis@...gle.com" <sagis@...gle.com>, Chao Gao <chao.gao@...el.com>, 
	Rick P Edgecombe <rick.p.edgecombe@...el.com>, Jun Miao <jun.miao@...el.com>, 
	Vishal Annapurve <vannapurve@...gle.com>, "jgross@...e.com" <jgross@...e.com>, 
	"pgonda@...gle.com" <pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v3 11/24] KVM: x86/mmu: Introduce kvm_split_cross_boundary_leafs()

On Mon, Jan 19, 2026, Yan Zhao wrote:
> On Mon, Jan 19, 2026 at 07:06:01PM +0800, Yan Zhao wrote:
> > On Mon, Jan 19, 2026 at 06:40:50PM +0800, Huang, Kai wrote:
> > > Similar handling to 'end'.  An additional thing is if one to-be-split-
> > > range calculated from 'start' overlaps one calculated from 'end', the
> > > split is only needed once. 
> > > 
> > > Wouldn't this work?
> > It can work. But I don't think the calculations are necessary if the length
> > of [start, end) is less than 1G or 2MB.
> > 
> > e.g., if both start and end are just 4KB-aligned, of a length 8KB, the current
> > implementation can invoke a single tdp_mmu_split_huge_pages_root() to split
> > a 1GB mapping to 4KB directly. Why bother splitting twice for start or end?
> I think I get your point now.
> It's a good idea if introducing only_cross_boundary is undesirable.
> 
> So, the remaining question (as I asked at the bottom of [1]) is whether we could
> create a specific function for this split use case, rather than reusing
> tdp_mmu_split_huge_pages_root() which allocates pages outside of mmu_lock. 

Belatedly, yes.  What I want to avoid is modifying core MMU functionality to add
edge-case handling for TDX.  Inevitably, TDX will require invasive changes, but
in this case they're completely unjustified.

FWIW, if __for_each_tdp_mmu_root_yield_safe() were visible outside of tdp_mmu.c,
all of the x86 code guarded by CONFIG_HAVE_KVM_ARCH_GMEM_CONVERT[*] could live in
tdx.c.

Hmm, actually, looking at that again, it's totally doable to bury the majority of
the logic in tdx.c, the TDP MMU just needs to expose an API to split hugepages in
mirror roots.  Which is effectively what tdx_handle_mismatched_accept() needs as
well, since there can only be one mirror root in practice.

Oof, and kvm_tdp_mmu_split_huge_pages() used by tdx_handle_mismatched_accept()
is wrong; it operates on the "normal" root, not the mirror root.

Let me respond to those patches.

[*] https://lore.kernel.org/all/20260129011517.3545883-45-seanjc@google.com

> This
> way, we don't need to introduce a spinlock to protect the page enqueuing/
> dequeueing of the per-VM external cache (see prealloc_split_cache_lock in patch
> 20 [2]).
> 
> Then we would disallow mirror_root for tdp_mmu_split_huge_pages_root(), which is
> currently called for dirty page tracking in upstream code. Would this be
> acceptable for TDX migration?

Honestly, I have no idea.  That's so far in the future.

> [1] https://lore.kernel.org/all/aW2Iwpuwoyod8eQc@yzhao56-desk.sh.intel.com/
> [2] https://lore.kernel.org/all/20260106102345.25261-1-yan.y.zhao@intel.com/