linux-kernel - Re: [RFC PATCH v2 12/23] KVM: x86/mmu: Introduce kvm_split_cross_boundary

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <31c58b990d2c838552aa92b3c0890fa5e72c53a4.camel@intel.com>
Date: Thu, 13 Nov 2025 11:02:59 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "Zhao, Yan Y" <yan.y.zhao@...el.com>
CC: "Du, Fan" <fan.du@...el.com>, "Li, Xiaoyao" <xiaoyao.li@...el.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Hansen, Dave"
	<dave.hansen@...el.com>, "david@...hat.com" <david@...hat.com>,
	"thomas.lendacky@....com" <thomas.lendacky@....com>, "tabba@...gle.com"
	<tabba@...gle.com>, "vbabka@...e.cz" <vbabka@...e.cz>, "michael.roth@....com"
	<michael.roth@....com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "ackerleytng@...gle.com"
	<ackerleytng@...gle.com>, "kas@...nel.org" <kas@...nel.org>, "Weiny, Ira"
	<ira.weiny@...el.com>, "Peng, Chao P" <chao.p.peng@...el.com>, "Yamahata,
 Isaku" <isaku.yamahata@...el.com>, "Annapurve, Vishal"
	<vannapurve@...gle.com>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
	"Miao, Jun" <jun.miao@...el.com>, "x86@...nel.org" <x86@...nel.org>,
	"pgonda@...gle.com" <pgonda@...gle.com>
Subject: Re: [RFC PATCH v2 12/23] KVM: x86/mmu: Introduce
 kvm_split_cross_boundary_leafs()

On Thu, 2025-11-13 at 16:54 +0800, Yan Zhao wrote:
> On Tue, Nov 11, 2025 at 06:42:55PM +0800, Huang, Kai wrote:
> > On Thu, 2025-08-07 at 17:43 +0800, Yan Zhao wrote:
> > >  static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
> > >  					 struct kvm_mmu_page *root,
> > >  					 gfn_t start, gfn_t end,
> > > -					 int target_level, bool shared)
> > > +					 int target_level, bool shared,
> > > +					 bool only_cross_bounday, bool *flush)
> > >  {
> > >  	struct kvm_mmu_page *sp = NULL;
> > >  	struct tdp_iter iter;
> > > @@ -1589,6 +1596,13 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
> > >  	 * level into one lower level. For example, if we encounter a 1GB page
> > >  	 * we split it into 512 2MB pages.
> > >  	 *
> > > +	 * When only_cross_bounday is true, just split huge pages above the
> > > +	 * target level into one lower level if the huge pages cross the start
> > > +	 * or end boundary.
> > > +	 *
> > > +	 * No need to update @flush for !only_cross_bounday cases, which rely
> > > +	 * on the callers to do the TLB flush in the end.
> > > +	 *
> > 
> > s/only_cross_bounday/only_cross_boundary
> > 
> > From tdp_mmu_split_huge_pages_root()'s perspective, it's quite odd to only
> > update 'flush' when 'only_cross_bounday' is true, because
> > 'only_cross_bounday' can only results in less splitting.
> I have to say it's a reasonable point.
> 
> > I understand this is because splitting S-EPT mapping needs flush (at least
> > before non-block DEMOTE is implemented?).  Would it better to also let the
> Actually the flush is only required for !TDX cases.
> 
> For TDX, either the flush has been performed internally within
> tdx_sept_split_private_spt() 
> 

AFAICT tdx_sept_split_private_spt() only does tdh_mem_track(), so KVM should
still kick all vCPUs out of guest mode so other vCPUs can actually flush the
TLB?

> or the flush is not required for future non-block
> DEMOTE. So, the flush in KVM core on the mirror root may be skipped as a future
> optimization for TDX if necessary.
> 
> > caller to decide whether TLB flush is needed?  E.g., we can make
> > tdp_mmu_split_huge_pages_root() return whether any split has been done or
> > not.  I think this should also work?
> Do you mean just skipping the changes in the below "Hunk 1"?
> 
> Since tdp_mmu_split_huge_pages_root() originally did not do flush by itself,
> which relied on the end callers (i.e.,kvm_mmu_slot_apply_flags(),
> kvm_clear_dirty_log_protect(), and kvm_get_dirty_log_protect()) to do the flush
> unconditionally, tdp_mmu_split_huge_pages_root() previously did not return
> whether any split has been done or not.

Right.  But making it return any split has been done doesn't harm.

> 
> So, if we want callers of kvm_split_cross_boundary_leafs() to do flush only
> after splitting occurs, we have to return whether flush is required.

But assuming we always return whether "split has been done", the caller can also
effectively know whether the flush is needed.

> 
> Then, in this patch, seems only the changes in "Hunk 1" can be dropped.

I am thinking dropping both "Hunk 1" and "Hunk 3".  This at least makes
kvm_split_cross_boundary_leafs() more reasonable, IMHO.

Something like below:

@@ -1558,7 +1558,9 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct
tdp_iter *iter,
 static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
                                         struct kvm_mmu_page *root,
                                         gfn_t start, gfn_t end,
-                                        int target_level, bool shared)
+                                        int target_level, bool shared,
+                                        bool only_cross_boundary,
+                                        bool *split)
 {
        struct kvm_mmu_page *sp = NULL;
        struct tdp_iter iter;
@@ -1584,6 +1586,9 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
                if (!is_shadow_present_pte(iter.old_spte) ||
!is_large_pte(iter.old_spte))
                        continue;
 
+               if (only_cross_boundary && !iter_cross_boundary(&iter, start,
end))
+                       continue;
+
                if (!sp) {
                        rcu_read_unlock();
 
@@ -1618,6 +1623,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
                        goto retry;
 
                sp = NULL;
+               *split = true;
        }
 
        rcu_read_unlock();


Btw, I have to follow up this next week, since tomorrow is public holiday.