linux-kernel - Re: [RFC PATCH v2 12/23] KVM: x86/mmu: Introduce kvm_split_cross_boundary

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <aVzlgtXFMUotxI1d@yzhao56-desk.sh.intel.com>
Date: Tue, 6 Jan 2026 18:35:46 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Huang, Kai" <kai.huang@...el.com>
CC: "Du, Fan" <fan.du@...el.com>, "Li, Xiaoyao" <xiaoyao.li@...el.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Hansen, Dave"
	<dave.hansen@...el.com>, "david@...hat.com" <david@...hat.com>,
	"thomas.lendacky@....com" <thomas.lendacky@....com>, "tabba@...gle.com"
	<tabba@...gle.com>, "vbabka@...e.cz" <vbabka@...e.cz>, "kas@...nel.org"
	<kas@...nel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "ackerleytng@...gle.com"
	<ackerleytng@...gle.com>, "michael.roth@....com" <michael.roth@....com>,
	"Weiny, Ira" <ira.weiny@...el.com>, "Peng, Chao P" <chao.p.peng@...el.com>,
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "Annapurve, Vishal"
	<vannapurve@...gle.com>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
	"Miao, Jun" <jun.miao@...el.com>, "x86@...nel.org" <x86@...nel.org>,
	"pgonda@...gle.com" <pgonda@...gle.com>
Subject: Re: [RFC PATCH v2 12/23] KVM: x86/mmu: Introduce
 kvm_split_cross_boundary_leafs()

On Wed, Nov 19, 2025 at 11:41:51AM +0800, Yan Zhao wrote:
> Hi Kai and all,
> 
> Let me summarize my points clearly in advance:
> (I guess I failed to do it explicitly in my previous mails [1][2]).
> 
> - I agree with Kai's suggestion to return a "bool *split" to callers of
>   kvm_split_cross_boundary_leafs(). The callers can choose to do TLB flush or
>   not, since we don't want them to do TLB flush unconditionally. (see the "Note"
>   below).
Hi Kai,

Thanks for your review and bringing up the TLB flush issue!

After further thought, I finally chose not to return the split status in
kvm_split_cross_boundary_leafs(), because the split status is not accurate given
that we don't flush TLB before releasing mmu_lock in
tdp_mmu_split_huge_pages_root(). i.e., when the function returns split as false,
splits could still have occurred during the temporary release of mmu_lock.

So, I implemented the API like this:
(1) Do not return split status in kvm_split_cross_boundary_leafs().
(2) Let the caller decide whether and how to flush TLB according to the use
    cases. e.g.,
    - if it's for dirty tracking (e.g., splits before turning on PML),
      unconditionally flush TLB.
    - if it's in the fault path, e.g., tdx_check_accept_level(). No TLB flush is
      required (current TDX's tdx_track() also ensures no need for a separate
      flush).
    - if it's for gmem punch hole or page conversions, the callers can delay the
      TLB flush for splits and combine it with the flush for zaps.

I've posted this implementation in v3
https://lore.kernel.org/all/20260106101646.24809-1-yan.y.zhao@intel.com.

Please let me know if it doesn't look good.

Thanks
Yan

 
> - I think it's OK to skip TLB flush before tdp_mmu_iter_cond_resched() releases
>   the mmu_lock in tdp_mmu_split_huge_pages_root(), as there's no known use case
>   impacted up to now, according to the analysis in [1].
> 
> - Invoke kvm_flush_remote_tlbs() for tdp_mmu_split_huge_pages_root() in this
>   series is for
>   a) code completeness.
>      kvm_split_cross_boundary_leafs() does not force that the root must be a
>      mirror root.
> 
>      TDX alone doesn't require invoking kvm_flush_remote_tlbs() as it's done
>      implicitly in tdx_sept_split_private_spt(). TDX share memory also does not
>      invoke kvm_split_cross_boundary_leafs().
> 
>   b) code consistency.
>      kvm_unmap_gfn_range() also returns flush for callers to invoke
>      kvm_flush_remote_tlbs(), even when the range is of KVM_FILTER_PRIVATE
>      alone.
> 
> I'll update the patch with proper comments to explain the above points if you
> are agreed.
> 
> Thanks
> Yan
> 
> Note:
> Currently there are 3 callers of kvm_split_cross_boundary_leafs():
> 1) tdx_check_accept_level(), which actually has no need to invoke
>    kvm_flush_remote_tlbs() since it splits mirror root only.
> 
> 2) kvm_arch_pre_set_memory_attributes(), which can combine the flush together
>    with the TLB flush due to kvm_unmap_gfn_range().
> 
> 3) kvm_gmem_split_private(), which is invoked by gmem punch_hole and gmem
>    conversion from private to shared. The caller can choose to do TLB flush
>    separately or together with kvm_gmem_zap() later.
> 
> 
> [1] https://lore.kernel.org/all/aRbHtnMcoqM1gmL9@yzhao56-desk.sh.intel.com
> [2] https://lore.kernel.org/all/aRwSkc10XQqY8RfE@yzhao56-desk.sh.intel.com
> 
> On Tue, Nov 18, 2025 at 06:49:31PM +0800, Huang, Kai wrote:
> > > >
> Will reply the rest of your mail seperately later.