linux-kernel - Re: [PATCH v3 11/24] KVM: x86/mmu: Introduce kvm_split_cross_boundary

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aXHEpfcyHtaMcqPz@yzhao56-desk.sh.intel.com>
Date: Thu, 22 Jan 2026 14:33:09 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Vishal Annapurve <vannapurve@...gle.com>, Kai Huang <kai.huang@...el.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, Fan Du <fan.du@...el.com>, Xiaoyao Li
	<xiaoyao.li@...el.com>, Chao Gao <chao.gao@...el.com>, Dave Hansen
	<dave.hansen@...el.com>, "thomas.lendacky@....com" <thomas.lendacky@....com>,
	"vbabka@...e.cz" <vbabka@...e.cz>, "tabba@...gle.com" <tabba@...gle.com>,
	"david@...nel.org" <david@...nel.org>, "kas@...nel.org" <kas@...nel.org>,
	"michael.roth@....com" <michael.roth@....com>, Ira Weiny
	<ira.weiny@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "ackerleytng@...gle.com"
	<ackerleytng@...gle.com>, "nik.borisov@...e.com" <nik.borisov@...e.com>,
	Isaku Yamahata <isaku.yamahata@...el.com>, Chao P Peng
	<chao.p.peng@...el.com>, "francescolavra.fl@...il.com"
	<francescolavra.fl@...il.com>, "sagis@...gle.com" <sagis@...gle.com>, "Rick P
 Edgecombe" <rick.p.edgecombe@...el.com>, Jun Miao <jun.miao@...el.com>,
	"jgross@...e.com" <jgross@...e.com>, "pgonda@...gle.com" <pgonda@...gle.com>,
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v3 11/24] KVM: x86/mmu: Introduce
 kvm_split_cross_boundary_leafs()

On Tue, Jan 20, 2026 at 10:02:41AM -0800, Sean Christopherson wrote:
> On Tue, Jan 20, 2026, Vishal Annapurve wrote:
> > On Fri, Jan 16, 2026 at 3:39 PM Sean Christopherson <seanjc@...gle.com> wrote:
> > >
> > > On Thu, Jan 15, 2026, Kai Huang wrote:
> > > > static int __kvm_tdp_mmu_split_huge_pages(struct kvm *kvm,
> > > >                                         struct kvm_gfn_range *range,
> > > >                                         int target_level,
> > > >                                         bool shared,
> > > >                                         bool cross_boundary_only)
> > > > {
> > > >       ...
> > > > }
> > > >
> > > > And by using this helper, I found the name of the two wrapper functions
> > > > are not ideal:
> > > >
> > > > kvm_tdp_mmu_try_split_huge_pages() is only for log dirty, and it should
> > > > not be reachable for TD (VM with mirrored PT).  But currently it uses
> > > > KVM_VALID_ROOTS for root filter thus mirrored PT is also included.  I
> > > > think it's better to rename it, e.g., at least with "log_dirty" in the
> > > > name so it's more clear this function is only for dealing log dirty (at
> > > > least currently).  We can also add a WARN() if it's called for VM with
> > > > mirrored PT but it's a different topic.
> > > >
> > > > kvm_tdp_mmu_gfn_range_split_cross_boundary_leafs() doesn't have
> > > > "huge_pages", which isn't consistent with the other.  And it is a bit
> > > > long.  If we don't have "gfn_range" in __kvm_tdp_mmu_split_huge_pages(),
> > > > then I think we can remove "gfn_range" from
> > > > kvm_tdp_mmu_gfn_range_split_cross_boundary_leafs() too to make it shorter.
> > > >
> > > > So how about:
> > > >
> > > > Rename kvm_tdp_mmu_try_split_huge_pages() to
> > > > kvm_tdp_mmu_split_huge_pages_log_dirty(), and rename
> > > > kvm_tdp_mmu_gfn_range_split_cross_boundary_leafs() to
> > > > kvm_tdp_mmu_split_huge_pages_cross_boundary()
> > > >
> > > > ?
> > >
> > > I find the "cross_boundary" termininology extremely confusing.  I also dislike
> > > the concept itself, in the sense that it shoves a weird, specific concept into
> > > the guts of the TDP MMU.
> > >
> > > The other wart is that it's inefficient when punching a large hole.  E.g. say
> > > there's a 16TiB guest_memfd instance (no idea if that's even possible), and then
> > > userpace punches a 12TiB hole.  Walking all ~12TiB just to _maybe_ split the head
> > > and tail pages is asinine.
> > >
> > > And once kvm_arch_pre_set_memory_attributes() is dropped, I'm pretty sure the
> > > _only_ usage is for guest_memfd PUNCH_HOLE, because unless I'm misreading the
> > > code, the usage in tdx_honor_guest_accept_level() is superfluous and confusing.
> > >
> > > For the EPT violation case, the guest is accepting a page.  Just split to the
> > > guest's accepted level, I don't see any reason to make things more complicated
> > > than that.
> > >
> > > And then for the PUNCH_HOLE case, do the math to determine which, if any, head
> > > and tail pages need to be split, and use the existing APIs to make that happen.
> > 
> > Just a note: Through guest_memfd upstream syncs, we agreed that
> > guest_memfd will only allow the punch_hole operation for huge page
> > size-aligned ranges for hugetlb and thp backing. i.e. the PUNCH_HOLE
> > operation doesn't need to split any EPT mappings for foreseeable
> > future.
> 
> Oh!  Right, forgot about that.  It's the conversion path that we need to sort out,
> not PUNCH_HOLE.  Thanks for the reminder!
Hmm, I see.
However, do you think it's better to leave the splitting logic in PUNCH_HOLE as
well? e.g., guest_memfd may want to map several folios in a mapping in the
future, i.e., after *max_order > folio_order(folio);